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Abstract 

Many  parallel  programs  are  written  in  SPMD  style,  i.e.  by  running  the  same  sequential  program  on  all 
processes.  SPMD  programs  include  synchronization,  but  it  is  easy  to  write  incorrect  synchronization  patterns. 
We  propose  a  system  that  verifies  a  program’s  synchronization  pattern.  We  also  propose  language  features  to 
make  the  synchronization  pattern  more  explicit  and  easily  checked.  We  have  implemented  a  prototype  of  our 
system  for  Split-C  and  successfully  verified  the  synchronization  structure  of  realistic  programs. 


1  Introduction 

Explicitly-parallel  programming — where  the  programmer  specifies  the  parallelism  in  a  computation — is  arguably 
the  most  widely  used  parallel  programming  paradigm.  Despite  many  years  of  practical  experience,  there  has  been 
little  work  on  the  static  semantics  of  explicitly-parallel  programming  languages.  We  propose  a  static  semantics  for 
global  synchronization  that  guarantees  an  explicitly  parallel  program  has  no  global  synchronization  errors.  Our 
proposal  is  based  on  a  formalization  of  widespread  programming  practices.  We  have  proven  the  soundness  of 
our  method  and  implemented  a  prototype  system.  Experimental  evidence  gathered  from  testing  our  system  on  a 
benchmark  suite  supports  our  hypothesis  that  the  global  synchronization  structure  of  realistic  programs  can  be 
formalized  and  automatically  verified. 

Our  system  was  developed  in  the  context  of  a  distributed  memory,  shared  address  space  programming  language 
(Split-C,  an  SPMD  language  developed  at  Berkeley  [5]),  but  we  found  it  equally  applicable  to  checking  the  synchro¬ 
nization  structure  of  shared  memory,  shared  address  space  parallel  programs;  our  method  can  show  the  synchro¬ 
nization  correctness  of  the  SPLASH-2  [25]  benchmarks.  We  expect  a  similar  result  should  hold  for  pure  message 
passing  programs  as  well,  but  such  programs  may  not  rely  on  global  synchronization  to  the  same  degree  as  programs 
written  using  a  shared  address  space.  Note,  however,  that  standard  message  passing  libraries  such  as  MPI  [20] 
include  global  synchronization  primitives. 

1.1  Global  Synchronization 

A  simple  and  popular  parallel  programming  model  is  SPMD  (for  Single  Program,  Multiple  Data).  SPMD  programs 
are  explicitly-parallel  programs  written  in  sequential  languages  extended  with  communication  and  synchronization 
primitives.  A  typical  SPMD  code  skeleton  is 

workl ( ) ; 
barrier; 
work2() ; 


*This  material  is  based  in  part  upon  work  supported  by  DARPA  contract  F30602-95-C-0136. 
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if  (random! ))  barrier; 
worklQ;  barrier; 

if  (x)  barrier  else  workQ 

work2() ; 

(a)  processes  left  behind 

(e)  correct  if  processes  agree  on  x’s  value 

while  (randomQ)  barrier; 

o 

1 

V 

•H 

workl();  barrier; 

while  i  <  10 

work2 ( ) ;  barrier ; 

(if  (i  =  1)  barrier; 

work3() ; 

i  <-  i  +  1) ; 

(b)  processes  “trapped”  in  a  loop 

barrier 

(f)  correct  loop 

if  (randomQ) 

if  (randomQ)  barrier  else  broadcast; 

(barrier;  barrier) 

else 

(worklQ;  barrier;  work2();  barrier) 

(c)  conflicting  barrier/broadcast 

(g)  if  with  matching  barriers 

a  <-  randomQ; 

o 

1 

V 

•H 

if  (a)  barrier;  (*) 

if  (randomQ) 

x  <-  x  +  1; 

(while  (i  <  10)  (barrier;  i  <-  i  +  1)) 

if  (not  a)  barrier;  (*) 

else 

(j  <-  i  +  10; 

while  (j  <  20)  (worklQ;  barrier;  j  <-  j  +  1)) 

(d)  correct  but  not  structurally  correct 

(h)  structurally  correct  but  not  verifiable 

Figure  1:  Examples 


barrier; 
work3() ; 

where  barrier  is  an  operation  that  causes  a  process  to  block  until  all  other  processes  have  also  reached  a  barrier. 
In  SPMD  execution,  all  processes  execute  a  copy  of  the  program  independently.  In  this  example,  the  barriers  serve 
to  guarantee  that,  e.g.,  all  processes  are  done  with  workl  ( )  before  proceeding  to  work2  ( ) .  The  only  synchronization 
is  at  the  barriers — processes  execute  worknQ  asynchronously. 

While  conceptually  simple,  the  combination  of  asynchronous  execution  and  explicit  global  synchronization  intro¬ 
duces  subtle  issues  of  program  structure  and  correctness.  Figure  1  gives  examples  illustrating  correct  and  incorrect 
synchronization  patterns.  In  these  examples,  random ()  returns  a  different  value  in  every  process  (causing  different 
branch  decisions  in  different  processes)  and  worknQ  is  a  function  call  that  does  no  synchronization.  In  all  of  the 
examples  barriers  are  executed  conditionally;  we  have  observed  that  almost  all  SPMD  programs  have  conditional 
synchronization. 

There  are  two  basic  forms  of  incorrect  synchronization.  In  Figure  la,  processes  execute  different  numbers  of 
barriers,  causing  the  program  to  “hang”  when  some  processes  terminate  while  others  wait  at  a  barrier.  The 
same  problem  occurs  in  loops  containing  barriers  if  processes  execute  differing  numbers  of  iterations  (Figure  lb). 
The  second  problem  is  illustrated  by  Figure  lc,  where  some  processes  execute  a  barrier  while  others  execute  a 
broadcast.  In  SPMD  languages  simultaneously  executing  different  synchronization  operations  causes  a  runtime 
error  (or,  in  some  implementations,  undefined  behavior). 

Even  correct  SPMD  synchronization  can  be  subtle.  Figure  le  is  correct,  provided  that  the  values  of  variable  x 
(which  is  a  replicated  variable,  i.e.  each  process  has  a  variable  x  local  to  the  process)  is  the  same  in  all  processes. 
This  pattern — conditional  synchronization  where  the  program’s  design  guarantees  processes  make  the  same  branch 
decisions — is  ubiquitous  in  SPMD  programs.  Figure  If  gives  a  more  complex  example  illustrating  the  same  point. 
However,  in  correct  programs  processes  need  not  always  make  the  same  branch  decisions,  as  Figures  Id,  g,  and  h 
show. 
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1.2  Synchronization  Verification 


Figure  le  shows  that  an  important  component  of  understanding  synchronization  behavior  is  knowing  which  repli¬ 
cated  variables  must  have  the  same  value  in  all  processes:  We  call  such  variables  single-valued1 .  Replicated 
variables  that  may  have  different  values  in  different  processes  are  multi-valued.  In  practice,  SPMD  programmers 
use  synchronization  in  a  highly  structured  way.  All  SPMD  programs  we  have  seen  observe  the  following  notion  of 
synchronization  correctness. 

Definition  1.1  (Structural  Correctness)  An  expression  is  structurally  correct  if  all  subexpressions  e  satisfy  the 
following:  Let  V  be  the  set  of  single- valued  variables  on  entry  to  e.  If  processes  begin  execution  of  e  in  environments 
that  agree  on  the  values  of  V  and  all  processes  terminate  (i.e.,  no  process  loops),  then  all  processes  execute  the 
same  sequence  of  synchronization  operations. 

It  is  easy  to  check  that  Figure  If,  g,  and  h  are  all  structurally  correct  and  that  Figure  le  is  structurally  correct 
assuming  x  is  single- valued.  Figure  Id  is  an  example  of  a  program  that  has  no  synchronization  errors  but  is  not 
structurally  correct  (because  of  the  expressions  marked  (*)). 

1.3  Barrier  Inference 

We  have  developed  a  static  semantics  that  verifies  that  a  program  has  structurally  correct  synchronization.  Since 
barriers  are  the  most  common  form  of  SPMD  synchronization,  we  call  this  process  barrier  inference.  Statically 
checking  synchronization  behavior  guarantees  that  programs  never  fail  by  “hanging”  or  executing  conflicting  syn¬ 
chronization  operations.  SPMD  programmers  do  make  such  mistakes2,  and  our  techniques  eliminate  this  class  of 
bugs.  Equally  important,  our  method  makes  explicit  the  heretofore  implicit  assumptions  about  single-valued  vari¬ 
ables  in  SPMD  programs.  In  our  experience,  this  extra  information  is  extremely  useful  for  understanding  SPMD 
programs  written  by  others. 

There  are  structurally  correct  programs  that  our  system  cannot  verify,  such  as  Figure  lh.  Intuitively,  the  problem 
with  this  example  is  that  although  both  branches  execute  the  same  number  of  barriers,  our  system  can  only  infer 
that  the  branches  each  execute  some  unknown  number  of  barriers  and  cannot  tell  that  these  numbers  are  the 
same.  In  contrast,  our  system  has  no  difficulty  with  Figure  lg,  where  the  system  can  infer  that  both  branches 
execute  exactly  two  barriers.  While  we  have  seen  examples  similar  to  Figure  lg,  we  have  seen  no  programs  with 
the  structure  of  Figure  lh. 

We  present  our  barrier  inference  algorithm,  which  statically  verifies  the  correctness  of  an  SPMD  program’s  synchro¬ 
nization  behavior  (Section  3),  along  with  a  proof  of  its  soundness  (Section  3.1).  We  also  propose  language  features 
that  make  the  synchronization  structure  of  SPMD  programs  explicit  (Section  4.1).  We  have  implemented  a  pro¬ 
totype  system  to  validate  the  algorithm  and  to  empirically  study  the  proposed  language  features.  We  tested  the 
prototype  on  a  substantial  number  of  programs  written  in  Split-C  (Section  5).  Experience  with  our  implementation 
is  positive;  the  system  successfully  checks  the  benchmarks  with  a  few  minor  modifications  to  the  programs,  includ¬ 
ing  one  to  correct  a  bug  detected  by  our  system.  We  have  also  examined  the  Splash-2  benchmarks  [25]  by  hand 
and  found  that  all  but  one  can  be  checked  with  our  system  (Section  5.2).  These  experiments  were  for  medium-size 
programs;  we  believe  that  static  verification  of  synchronization  is  especially  important  for  larger  systems  because 
these  are  not  amenable  to  manual  verification,  and  also  for  higher-order  languages  (e.g.  parallel  object-oriented 
languages)  where  control-flow  is  less  explicit. 

1 A  formal  definition  of  single- valued  variables  is  subtle;  see  Section  3.1. 

2 It  is  difficult  to  provide  direct  evidence  for  this  claim,  but  we  have  committed  such  programming  mistakes  ourselves  and  found 
them  in  existing,  presumably  debugged,  programs. 
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2  The  Language 


We  present  our  system  using  C,  a  small  procedural  language  extended  with  three  parallel  operations:  barrier, 
broadcast  (which  is  like  barrier  except  a  distinguished  value  is  sent  to  all  processes),  and  communicate  (which 
allows  asynchronous  communication).  As  our  interest  is  in  synchronization  operations  such  as  barrier  and 
broadcast,  we  leave  the  semantics  of  communicate  unspecified.  The  grammar  for  C  is: 

Expr  ::=  i 

I  id 

|  barrier 
|  broadcast 
|  communicate 
|  id(Expr, . . . ,  Expr) 

|  id  Expr 
|  if  Expr  Expr  else  Expr 
|  Expr;  Expr 
|  let  id  in  Expr 

|  letrec  id(id,  .  .  . ,  id)  =  Expr  in  Expr 

All  values  in  C  are  integers  and  all  variables  are  replicated.  A  let  introduces  a  new  integer  variable  and  a  letrec 
introduces  a  potentially  recursive  function  definition;  the  other  expressions  are  also  standard.  There  are  some 
predefined  functions,  such  as  +,  which  are  all  mathematical  functions,  i.e.  their  result  depends  solely  on  the  value 
of  their  arguments.  In  examples  we  write  while  ei  e2  as  shorthand  for 

letrec  f()  =  if  et  (e2;  f())  else  0  in  f  () 

This  sparse  language  is  sufficient  to  illustrate  the  novel  aspects  of  our  techniques.  In  Section  4.3  we  discuss 
extensions  to  the  C  and  FORTRAN-based  languages  used  in  practice.  Figure  2  gives  a  simple  rewrite  semantics 
for  £  in  a  variation  of  continuation-passing  style  (CPS).  The  computation  of  a  single  process  is  a  sequence  of  steps: 

State  State 

where  a  state  FunEnv  x  Env  x  Cont  x  Expr  consists  of  an  expression  e  to  be  evaluated,  environments  for  the  variables 
and  function  names  in  scope  at  e,  and  the  computation  to  perform  after  evaluating  e  (a  continuation).  Readers 
familiar  with  CPS  semantics  will  note  that  this  CPS  semantics  is  non-standard,  because  a  continuation  is  a  function 
that  returns  only  the  next  state  in  the  computation,  rather  than  the  final  answer  of  the  entire  computation.  This 
modification  exposes  intermediate  states  of  the  computation,  which  is  needed  to  define  the  semantics  of  barrier 
and  broadcast. 

The  semantics  of  C  model  synchronization  structure,  but  not  the  details  of  the  communication  primitives.  The 
synchronization  primitives,  barrier  and  broadcast,  are  the  only  operations  that  require  global  interaction.  For 
barrier,  once  all  processes  reach  a  barrier  each  process  proceeds  with  its  continuation.  The  rule  for  broadcast 
is  identical.  The  values  returned  by  the  communication  operations  are  predicted  by  an  oracleQ  function.  The  only 
place  where  the  communicated  value  is  important  is  in  broadcast:  it  returns  the  same  value  in  all  processes,  but 
the  actual  value  is  not  important  for  synchronization  verification.  The  barrier  operation  does  not  communicate 
any  values,  so  its  result  is  always  0  (an  arbitrary  choice). 

A  few  other  comments  on  details  of  the  semantics  are  necessary.  For  simplicity,  we  assume  that  variables  and 
functions  are  given  unique  names  (i.e.,  no  names  hide  names  in  outer  scopes).  This  property  can  always  be 
enforced  by  suitably  renaming  variables.  Define  FF(f)  as  the  set  of  function  names  in  scope  at  /’ s  definition;  ; 
FV(f)  is  the  set  of  identifiers  (other  than  /’ s  formal  parameters)  in  scope  at  /’ s  definition.  Figure  2  uses  several 
operations  on  environments.  The  set  dom(E)  is  the  domain  of  E.  The  environment  E\V  is  E  with  the  domain 
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F  FunEnv  =  FunctionName  — >  FunctionDef  inition 
E  Env  =  Var  — >  TV" 

C  Cont  =  Env  x  TV"  — >■  State 

State  =  FunEnv  x  Env  x  Cont  x  Expression  +  Env  x  TV" 

{F,  E,  C,  i)  C(E,i) 

( F,E,C,x )  C'(E,E(x)) 

(F,  E,C,  communicate)  C  (E ,  oracleQ) 

(F,  E,C'o,  f  (Expri,  .  .  . ,  Exprn))  (F,  E,  C\,  Expri)  where 

F(f)  =  f(xi  ,...,xn)  =  Expr 
Ci  =  \E2,v1.(F,  E2,  C2,  Expr2) 

Cn  —  1  —  — i -(E,  En,  Cn,  Exprn) 

Cn  =  XEn+i,  vn.(F\  FF(f),E0,  C',  Expr) 

Eo  =  (En  +  l  \  FV(f))[xi  <r~  Vl,  .  .  .  ,  Xn  <r~  Vn\ 

C'  =  A  E',  v.Co  (( En+i//FV{f )  +  E'//{xu  ..., 

(F,  E,  C'o,p(Expri,  .  .  . ,  Exprn))  (F,  E,  C\,  Expri)  where 

p  is  a  primitive 

Ci  =  \E2,vi.(F,  E2,  C2,  Expr2) 

Cn  —  i  —  A En,  vn  —  i.(F,  En,  Cn,  Expr„) 

Cn  —  XEn+i ,  vn  -Cq  (En+i  ,  p  {vi , . . . ,  vn ) ) 

(F,  E ,  C,  x  Expr)  ( F,E ,  \E' ,  v.C  (E'[x  v\,  v),  Expr ) 

(F,E,C,  if  Expri  Expr2  else  Exprf)  (F,  E,  Co,  Expri)  where 

Co  =  A E' ,  v.(F,  E ' ,  C,  if  v  =  0  then  Expr2  else  Exprf) 

(F,  E ,C,  Exprp,  Expr2)  ( F,E ,  XE' ,v.{F,  E' ,C,  Expr2),  Expri) 

(F,  E ,C,let  x  in  Expr)  {F,E[x<—0],  AE',v.C  (E'//{x},v),  Expr ) 

(F,  E,  C,  letrec  f(x i,  .  .  . ,  xn)  =  Expri  in  Expr2)  (F[f  f(x i,  .  .  . ,  xn)  =  Expr i],  E,  C,  Expr2) 

FF(f)  =  dom(F),  FV(f)  =  dom(E) 

[{Fi,Ei,  Ci,  barrier),  ...  ,(Fn,En,  Cn,  barrier)]  [Ci  (Ei,  0),  .  .  . ,  Cn  (En,0)] 

[(Fi,Ei,  Ci,  broadcast),  ...,  (Fn,  En,  Cn,  broadcast }]  [C\  (Ei,  v),  .  .  . ,  Cn  (En ,  t)]  where  v  =  oracle)) 

Figure  2:  Semantics  for  C 
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restricted  to  variables  V.  The  environment  E/ jV  is  E  with  variables  V  removed;  i.e. ,  E\(dom(E)  Ob).  The 
environment  E\  +  is  the  combination  of  two  environments  E\  and  with  disjoint  domains. 

The  result  of  a  (terminating)  sequence  of  rewrites  is  an  environment  recording  the  final  state  of  the  program  and 
an  integer  result.  The  computation  of  n  processes  executing  in  parallel  is  a  sequence  of  steps: 

State11  State11 

The  transitions  for  vectors  of  states  include  the  synchronization  rules  for  barrier  and  broadcast,  plus  a  general 
rule  for  interleaving  the  transitions  of  individual  processes: 

[Si,  .  .  . ,  Si_ls  Si,  Si+1,  .  .  .  Sn]  [Si ,  •  •  -  j  Si — i,  S^,  Si_|_i , . . .  sn] 

whenever  Si  S'.  Let  I  be  the  initial  continuation  A  E,  v.(E,  v).  The  evaluation  of  an  expression  e  on  n  processors 
is 

[({0,pid=  1},  I,  e),  .  .  .  n  times .  .  . ,  ({0,  pid  =  n} ,  I,  e)]  ^  [(E1;  ii),  .  .  . ,  (En,  in)] 

The  initial  environment  of  each  process  contains  a  process  id  in  the  variable  pid.  This  value  distinguishes  one 
process  from  another. 

If  all  processes  halt  with  a  final  environment  and  integer  value  then  that  run  is  successful.  A  run  is  unsuccessful  if 
(1)  processes  execute  a  different  number  of  barriers  (Figures  la  and  lb),  (2)  some  processes  reach  a  barrier  at 
the  same  time  others  reach  a  broadcast  (Figure  lc),  or  (3)  one  or  more  processes  loop.  Our  methods  are  capable 
of  statically  checking  realistic  programs  for  (1)  and  (2). 


3  Barrier  Inference 

The  rules  of  our  inference  system  model  two  aspects  of  SPMD  computation.  The  first  aspect  is  the  sequence  of 
barriers  and  broadcasts  executed  in  evaluating  an  expression  e.  The  rules  associate  an  abstract  synchronization 
sequence  with  e: 

s  =  {E,f}u{b,ry 

A  sequence  value  s  £  {b,  r}*  means  every  process  executes  exactly  the  sequence  s  of  barriers  ( b )  and  broadcasts 
(r).  A  sequence  value  /  means  every  process  executes  the  same  sequence  of  barriers  and  broadcasts,  but  the 
exact  sequence  is  unknown.  The  sequence  value  _L  means  no  process  executes  the  expression.  It  is  possible  to  assign 
an  element  of  S  to  every  structurally  correct  expression.  There  is  an  ordering  on  synchronization  sequences: 

_LA  s  A  /  for  any  s  £  {b,  r}* 

The  second  aspect  of  the  inference  rules  tracks  single-valued  variables.  An  abstract  environment  AEnv  :  Vars  — > 
{+,  o}  is  a  mapping  from  program  variables  to  +  (indicating  a  variable  is  single- valued)  or  O  (indicating  a  variable 
may  be  multi-valued) .  There  is  an  ordering  +  A 

Analogous  to  an  abstract  environment  there  is  an  abstract  function  environment,  which  is  a  mapping 

FEnv  :  FunctionNames  — >■  {+,  o}n  x  AEnv  x  {+,  o}  x  AEnv  x  S 

from  function  names  to  function  signatures. 

Definition  3.1  A  function  /  satisfies  a  function  signature  written 

(ai ,  •  •  • ,  an) ,  A  — >  a,  A7 ,  s 

if  the  following  hold:  /  has  n  arguments  and  its  free  variables  are  those  in  dom(A)  =  dom(A')',  processes  that  begin 
execution  of  /  in  states  agreeing  on  values  of  the  single- valued  function  arguments  in  (ai,  .  .  . ,  an)  and  single- valued 
variables  in  A,  either  diverge  or  (1)  agree  on  the  result  if  a  =  +,  (2)  agree  on  the  value  of  every  single-valued 
variable  in  A',  and  (3)  have  executed  the  same  sequence  of  synchronization  operations  s. 
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For  example,  the  signature 


/  :  (+;  ^;  0  — t  +,  0,  e 

says  that  f(a,  b)  =  f(a,  c)  for  all  b  and  c  (provided  both  evaluations  terminate)  and  /  executes  no  synchronization 
operations.  The  inference  system  proves  statements  of  the  form 

B,  A  h  Expr  :  a,  A7,  s 

which  is  read:  Given  functions  with  abstract  function  environment  B,  if  all  processes  begin  the  execution  of  Expr 
with  the  same  values  for  variables  marked  single- valued  in  A,  then  all  processes  that  terminate  (1)  agree  on  the 
values  of  variables  marked  single-valued  in  A',  (2)  agree  on  the  result  if  a  =  +,  and  (3)  have  executed  the  same 
sequence  of  synchronization  operations  s.  Thus,  any  such  proof  shows  e’s  structural  correctness  (Definition  1.1). 

The  inference  rules  are  given  in  Figure  3.  In  the  remainder  of  this  section  we  discuss  the  rules,  present  a  soundness 
result,  and  illustrate  barrier  inference  with  some  examples.  The  [Int]  rule  is  simple;  evaluating  an  integer  is  single¬ 
valued  (all  processes  compute  the  same  integer),  does  not  affect  the  set  of  single- valued  variables  in  the  environment, 
and  executes  no  synchronization  operations.  The  [Id]  rule  is  similar;  the  result  is  single-valued  only  if  all  processes 
have  the  same  value  for  the  identifier  in  the  environment.  A  communicate  is  assumed  to  be  multi-valued,  as 
processes  may  receive  different  values.  When  a  process  needs  to  communicate  a  value  to  all  processes,  broadcast 
is  more  efficient  than  n  communicate  operations,  and  makes  explicit  that  the  result  is  single- valued3.  A  barrier 
and  a  broadcast  are  always  single-valued  and  each  executes  a  single  synchronization  operation.  The  [Prim]  rule 
says  that  primitive,  side-effect  free  functions  are  single- valued  if  all  their  arguments  are  single- valued. 

In  rule  [Fun] ,  actual  parameters  must  be  single- valued  wherever  the  function  signature  requires  single- valued  argu¬ 
ments  (the  comparisons  a8-  A  a().  Similarly,  the  environment  of  the  call  must  be  single- valued  in  all  variables  the 
signature  requires  be  single- valued.  We  define  A\  A  A 2  if  dom(Ai)  =  dom(A2)  and  for  all  x  £  dom(Ai)  we  have 
Ai(x)  A  A2(x). 

The  conclusion  of  [Fun]  and  several  of  the  other  rules  combine  synchronization  sequences.  The  sequence  si  ®  s2  is 
the  best  description  of  si  followed  by  s 2: 


{si  •  s2  if  si,  s2  G  {b,  r}* 

T  if  si  =_L  V  s2  =_L 
si  U  s2  otherwise 

where  si  •  s2  is  the  concatenation  of  strings  si  and  s2.  The  operator  ®  is  monotonic  in  both  arguments. 

Note  the  difference  between  the  treatment  of  primitive  and  user-defined  functions.  The  result  of  a  primitive  function 
is  single- valued  if  all  its  arguments  are  single- valued,  which  can  be  thought  of  as  a  kind  of  subtyping  rule.  Thus, 
some  uses  of  a  primitive  function  can  be  single- valued  and  others  not.  All  the  calls  to  a  user-defined  function 
are  either  single- valued  or  not,  depending  on  the  function’s  signature  in  the  abstract  function  environment.  This 
distinction  is  necessary,  because  user-defined  functions  with  side-effects  can  modify  single-valued  state.  We  have 
not  found  this  restriction  on  user-defined  functions  to  be  a  problem  in  practice  (see  Section  5.1). 

The  [Assign]  rule  updates  the  environment  based  on  the  new  value  of  the  assigned  variable;  this  reflects  the  fact 
that  a  variable  can  be  single-valued  at  some  points  in  the  program  and  not  at  others.  The  [Let]  rule  introduces  a 
new  variable,  which  is  initially  single-valued  as  it  is  initialised  to  0  in  all  processes.  A  new  function  is  introduced 
into  the  function  environment  by  the  [LetRec]  rule.  This  rule,  along  with  the  [Fun]  rule,  only  expresses  constraints 
on  the  function’s  signature,  but  does  not  specify  how  it  is  found.  Section  3.2  outlines  a  method  for  computing 
function  signatures. 

The  two  rules  for  if  are  interesting.  The  rule  [If-Single]  applies  when  the  predicate  is  single- valued.  All  processes 
take  the  same  branch,  but  we  do  not  know  which  branch.  In  this  case  a  conservative  upper  bound  over  the  results 
of  both  branches  suffices. 

The  rule  [If-Multi]  applies  when  the  predicate  is  multi-valued.  It  is  necessary  that  the  upper-bound  of  the  synchro¬ 
nization  sequence  of  the  branches  be  a  known  (not  /)  sequence.  A  subtle  point  is  determining  the  single-valued 

3  Our  experience  with  the  Split-C  programs  of  Section  5  shows  that  this  rule  is  nearly  universally  followed. 
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B ,  A  \~  %  ;  ®,  A,  6 


B,  A  h  id  :  A(id),  A,  c 


B,  A  b  communicate  :  <tV  A,  c 


B,  A  h  barrier  :  +,  A,  b 


B,  A  b  broadcast  :  +,  A,  r 
B,  Ao  h  Expr i  :  ai,  Ai,  si 

Id,  An  —  i  \~  Exprn  .  an,  An ,  sn 
5(/)  =  K,  .  .  .,a^),A  a,  A',s 
An  |dom(A)  ®  A 

VI  <  i  <  n.  ai  A  a) 

B,  A0  h  f{Expri ,  .  .  . ,  Exprn)  :  a,  Anj /dom(A')  +  A',  si  ®  .  .  .  ®  sn  ®  s 
B,  Ao  h  Expr i  :  ai,  Ai,  si 

Id,  An  —  i  \~  Exprn  .  an,  An ,  sn 

B,  A0  h  p  (Expri,  .  .  .,  Exprn)  :  ai  U  .  .  .  U  an ,  An,  si  ®  .  .  .  ®  sn 

B,  A  h  EVpr  :  a,  A',  s 
B,  A  h  *  Expr  :  a,  A'[x  a],  s 

B,  A[*  < — |-]  V  Expr  :  a,  A' ,  s 
B,  A  h  let  x  in  Expr  :  a,  A' / / {x},  s 

dom(A)  =  dom(A')  =  dom(Ao) 

S  =  (ai,...,an),j4— >  a,  A',  s 
A'  =  A"//{xi, 

B[/  5],  A[*i  ai,  .  .  . ,  an]  h  Expri  '■  a,  A",  s 

B[/  <-  ffj,  A0  V  EApr2  :  a'2,  A2,  s2 _ 

B,  Ao  \~  letrec  f(xi,  ...,  xn)  =  Expri  in  Expr2  :  a2,  A2,  s2 

B,  A0  h  Expri  ■  +  ,  Ai,  s i 
B,  A i  h  Expr2  :  a2,  A2,  -s2 

_ B,  Ai  h  Exprs  :  a3,  A3,  s3 _ 

B,  Ao  h  if  Expri  Expr2  else  Exprs  :  a2  U  as,  A2  U  As,  si  ®  (s2  LI  S3) 


B,  A0  h  Expri  '■  O,  Ai,  s  1 
B,  A 1  h  E;rpr2  :  a2,  A2,  s2 
B,  Ai  h  Exprs  :  a3,  A3,  s3 
s2  U  s3  ®  / 

A'  =  A\  <  ( AV(Expr2 )  U  AV(Exprs)) 

B,  Ao  h  i/  Expri  Expr2  else  Exprs  '■  -<®  A' ,  si  ®  (s2  U  s3) 


B,  Ao  h  Expri  '■  a  1,  Ai,  si 

_ B,  A 1  I-  Expr2  :  a2,  A2,  s2 _ 

B,  A0  h  Exprp,  Expr2  :  a2,  A2,  si  ®  s2 

Figure  3:  Inference  rules. 
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[Int] 

[Id] 

[Comm] 

[Barrier] 

[Broadcast] 

[Fun] 

[Prim] 

[Assign] 

[Let] 

[LetRec] 

[If-Single] 

[If-Multi] 

[Sequence] 


variables  of  the  final  environment.  Any  variable  that  is  modified  in  either  branch  could  have  different  values  in 
different  processes  on  exit  from  the  conditional;  all  of  these  variables  must  be  marked  multi-valued  in  the  final 
environment.  Let  AV{e)  be  the  set  of  variables  visible  at  e  that  may  be  assigned  in  the  evaluation  of  e  (including 
via  function  calls  in  e).  A  set  AV{e)  is  easily  computed.  Now  define  A<i{vi,  .  .  .,vn}  to  be  A[vi  .  .  . ,  vn  <t^. 

If  the  inference  system  of  Figure  3  cannot  assign  any  synchronization  value  to  an  expression,  then  evaluating  the 
expression  may  cause  processes  to  execute  differing  numbers  of  barriers  and  broadcasts — the  program  may  get  “out 
of  synch”.  In  this  case  the  program  is  rejected.  Of  course,  the  inference  system  is  conservative  and  may  reject 
correct  programs.  We  show  in  Section  5.1  that  the  system  in  fact  works  well  on  a  suite  of  benchmarks. 

3.1  Soundness 

A  sticky  point  in  trying  to  prove  our  system  correct  is  capturing  the  meaning  of  single- valued  variables.  Intuitively, 
a  variable  is  single- valued  if  all  processors  have  the  same  value  for  the  variable  at  the  same  time.  However,  “at  the 
same  time”  is  a  slippery  notion  in  a  setting  with  asynchronous  execution.  Only  at  points  of  global  synchronization 
(i.e. ,  barriers,  broadcasts,  and  the  start  and  end  of  execution)  is  it  possible  to  assert  anything  useful  about  the 
state  of  all  processes. 

The  key  to  this  problem  is  to  observe  that  the  values  of  single- valued  variables  depend  only  on  other  single- valued 
expressions.  Using  this  fact,  it  can  be  shown  (without  referring  to  time  except  within  a  single  process)  that 
if  processes  begin  execution  agreeing  on  single-valued  inputs,  then  they  terminate  agreeing  on  the  single-valued 
outputs. 

The  proof  of  soundness  has  two  steps.  First,  we  prove  that  single-valued  outputs  are  determined  solely  by  single¬ 
valued  inputs  for  a  process  in  isolation.  Second,  we  show  that  if  the  inference  rules  can  derive  any  proof  for  an 
expression,  then  all  processes  evaluating  that  expression  execute  the  same  sequence  of  synchronization  operations. 

A  few  definitions  are  required.  Environments  E\  and  E 2  are  equal  with  respect  to  an  abstract  environment  A, 
written  E\  E2,  if  dom(Ei)  =  dom(E2)  =  dom(A)  and  'ix.A(x)  =  +  =y  E i(x)  =  £2 (a:).  A  function  environment 
F  and  an  abstract  function  environment  B  are  compatible,  written  F  :  B,  if  dom(F)  =  dom(B)  and  for  all 
/  G  dom(F): 

F(f)  =  /(aq,  ...,xn)=  Expr 
B(f)  =  (cq, . . . ,  an),  A  — >  a,  A1 ,  s 
B\FF(f),  A[x  1  cq,  .  .  . ,  xn  an]  h  Expr  :  a,  A" ,  s 
A’  =  A"//{x  1,  ...,*„} 

•  ... 

An  execution  statei  ^>state2  is  an  execution  with  synchronization  sequence  t,  where  t  is  a  string  with  one  b  for 

each  barrier  and  one  r  for  each  broadcast  executed.  The  broadcast  sequence  of  an  evaluation  [Si,  .  .  .,Sn] 

[Sj,  .  .  . ,  S'n]  is  the  sequence  of  values  returned  by  successive  calls  to  broadcast  during  this  evaluation. 

Lemma  3.2  Let  e  be  any  expression  and  let  B,  A  b  e  :  a,  A' ,  s.  Let  E\  E 2,  and  F  :  B.  If 

[(-F,  Ei,  Ci,  e)]  ^  [Ui(£[,  q)] 

[(F,E2,C2,e)}  f  [C2(E'2,i2)\ 

1 2 

and  the  broadcast  sequences  of  both  evaluations  are  identical,  then  the  following  are  all  true: 

•  t\  =  f2  and  t2  A  s 

•  E'i  ~^i/  E'2 

•  a  =  +  =y  q  =  i2 
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Theorem  3.3  Let  e  be  any  expression  and  let  B,A  b  e  :  a,A',s.  Let  F  :  B  and  Ei  Ej  for  i,j  =  l..n.  Then 


[(F,  Elt  ( F ,  En,  I,  e>]  [(E[,v  1), ...,  (E'n,vn)\ 


or  some  process  diverges. 

The  proofs  of  Lemma  3.2  and  Theorem  3.3  are  in  Appendix  A. 

The  semantics  of  Figure  2  does  not  handle  synchronization  errors,  i.e.  the  cases  where  barriers  and  broadcasts 
are  mismatched,  or  when  some  processes  waits  at  a  barrier  while  other  processes  have  terminated.  In  those 
cases,  the  evaluation  hangs.  Theorem  3.3  shows  that  this  cannot  occur  with  barrier  inference:  either  the  program 
terminates,  or  the  evaluation  sequence  is  infinite.  Appendix  D  extends  £’s  semantics  with  error  checking  rules 
for  synchronization  errors  and  then  shows  that  Theorem  3.3  still  holds,  which  proves  that  barrier  inference  makes 
runtime  error  checking  for  synchronization  unnecessary. 

3.2  Implementation 

The  only  difficulty  in  translating  the  inference  rules  into  an  inference  algorithm  is  in  the  determination  of  the 
assumptions  to  use  in  function  environments.  We  define  G,  the  global  abstract  function  environment  which  is 
identical  to  the  abstract  function  environment  B,  except  that  it  contains  the  signatures  of  all  functions  of  a 
program  instead  of  those  currently  in  scope.  Using  a  global  environment  poses  no  problems  as  all  function  names 
are  assumed  to  be  unique. 

A  global  function  environment  can  be  used  to  attempt  to  construct  a  proof: 

0,  0  h  e  :  a,  0,  s 

for  an  expression  e  by  choosing  B  =  G\dom(B)  at  each  step  of  the  proof  derived  from  the  structure  of  e:  the 
other  quantities  (A,  A' ,  a,  s)  can  easily  be  computed  once  B  is  known.  The  proof  thus  constructed  may  however 
be  incorrect.  The  goal  of  an  implementation  is  to  compute  G  such  that  a  correct  proof  can  be  built  from  G,  or  to 
report  that  no  proof  exists,  i.e.  that  program  e  is  incorrect. 

A  value  for  G  is  found  by  recasting  the  inference  rules  as  a  function 

l(G,  A,  Expr)  =  (G7,  a,  A7,  s,  error)  :  FEnv  x  AEnv  x  Expr  — >■  FEnv  x  {+,  o}  x  AEnv  x  S  x  Bool 
The  definition  of  I  is  in  Figure  4. 

A  call  to  I(G,  A,  Expr)  computes  the  properties  of  an  expression  assuming  that  all  the  functions  behave  as  described 
in  G  and  that  A  describes  the  single-valuedness  of  the  free  variables  of  Expr.  The  function  I  is  total:  when  the 
[If-Multi]  rule  would  fail,  I  simply  returns  an  error  indication  in  its  last  argument.  This  error  is  propagated  back 
to  the  top-level  expression.  The  [Fun]  and  [LetRec]  inference  rules  express  constraints  on  signatures  in  the  abstract 
function  environment:  [Fun]  requires  that  the  single-valuedness  of  the  arguments  and  free  variables  match  the 
signature  of  the  function,  the  [LetRec]  rule  requires  that  the  inferred  signature  of  the  function’s  body  equals  the 
signature  in  the  abstract  function  environment.  When  these  constraints  conflict  with  the  function  signatures  of  the 
global  function  environment  G  passed  to  I ,  I  simply  returns  a  new  global  function  environment  G' ,  which  satisfies 
all  these  constraints.  Because  the  constraints  depend  on  the  assumed  environment  G,  the  computed  environment  G' 
may  not  satisfy  the  constraints  that  are  computed  by  I(G' ,  A,  Expr).  However,  a  fixed  point  G"  =  I(G" ,  A,  Expr) 
of  I  does  satisfy  all  the  constraints. 

Theorem  3.4  Given  a  program  p,  all  the  following  are  true: 

•  The  global  abstract  function  environments  G  of  p  forms  a  lattice  of  finite  height. 

•  I  is  monotonic  in  its  G  argument. 
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I(G,A,i) 
I(G,  A,  id) 
I(G,  A,  communicate) 
I(G,  A,  barrier) 
I(G,  A,  broadcast) 
I(G,A0,f(Expr1, . . . ,  Exprn)) 


I(G,  Ao,p(Expru  . . Exprn )) 


I(G,  A,  x  <—  Expr) 

I(G,A,let  x  in  Expr) 

I(G,  Ao,  letrec  f(x i,  .  .  . ,  xm)  =  Expr i 

in  Expr 2) 


I(G,  A0,  if  Expr  1  Expr2  else  Expr3) 


I(G,  A0,  Expri;  Expr2) 


( G ,  +,  A,  e,  false) 

(G,  A(id),  A,  e,  false) 

(G,  O,  A,  e,  false) 

(G,  +,  A,  b,  false) 

(' G,+,A,r ,  false) 

let  [Gi,ai,Ai,si,ei)  =  I(G,  A0,  Expri) 

and  .  .  . 

and  (Gn ,  an,  An ,  sn ,  en )  —  I )G ,  An _  1 ,  Exprn ) 

and  G'  =  Gi  U  . . .  U  Gn 

and  (a[,  .  .  .  a'n),A  — >■  a,  A' ,  s  =  G(f) 

and  (a[,  .  .  .a'JfA  ->■  a,A'fs_  =  G'(f)_  _ 

in  (  G'[f  (a)  U  ai,  .  .  . ,  a'n  U  a„),  A  U  ( An \dom{A))  — >■  a,  A',  s], 
a,  An / / dom(A')  +  A',  si  ®  .  .  .  ®  sn  ©  s,  e\  V  ...  V  en) 

let  (Gi,  ai,Ai,si,ei)  =  I(G,  A0,  Expri) 

and  .  .  . 

and  (Gn,  an,  An ,  sn ,  en )  —  I )G ,  An _  1 ,  Exprn) 
and  G'  =  G\  U  .  . .  U  Gn 

m  )  G  7  a  1  I — I  ...  I — I  an  ,  An ,  si  0  .  . .  0  ,  61  V  ...  V  en) 

let  (G1 ,  a,  A',  s,  e)  =  I(G,  A,  Expr)  in  (G',  a,  A'[x  a],  s,  e) 
let  (G',  a,  A’ ,  s,  e)  =  7(G,  A[x  < — h] ,  Expr)  in  (G',  a,  A’ / / {x},  s,  e) 
let  (ai, . . . ,  am),  A  — >■  a,  A1 ,  s  =  G(/) 

and  {Gi,a'i,A1,si,ei)  =  I  (G,A[x  1  <-  ai,...,xm  <-  am] ,  Expri) 
and  (G2,  a'2,  A2,  s 2,e2)  =  I(G,  A0,  Expr2) 
and  G'  =  Gi  U  G2 

and  (aT, . .  ,,aff),A  -p-a,Afs  =  G'(f) 
in  (  G'[f  (oT  ),  A  — >  a'i,  Aij  /  {xi  }Gi], 

a,2,A2,s2,ei  V  e2) 

let  (Gi,  ai,  Ai,  si,  ei)  =  I(G,  A0,  Expri) 

and  (G2,  a2,  A2,  s2,  e2)  =  /(G,  Ai,Expr2) 

and  (G3,  a3,  A3,  s3,  e3)  =  /(G,  Ai,Expr3) 

and  G'  =  GiUG2U  G3 

and  e'  =  ei  V  e2  V  e3 

and  A'  =  A2U  A3 

and  s'  =  si  ©  (s2  U  s3) 

in  if  ei  =  +  then  (G1 ,  a2  U  a3,  A' ,  s' ,  e') 

else  (G',  A'  <  ( AV(Expr2 )  U  AV(Expr3)),  s' ,  e'  V  (s2  U  s3)  =  /) 

let  (G 1,  ai,  Ai,  si,  ei)  =  I(G,  A0,  Expri) 
and  (G2,  a2,  A2,  s2,  e2)  =  I(G,  Ai,Expr2) 
in  (G 1  U  G2,  a2,  j42,  si  ©  s2,  ei  V  e2) 


Figure  4:  Inference  Function 
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•  A  fixed  point  I(G,  0,p)  =  ( G ,  a,  A,  s,  false)  exists  iff  a  proof  can  be  built  with  the  inference  rules  of  Figure  3. 

•  Any  proof  built  from  the  inference  rules  defines  a  global  abstract  environment  G'  such  that  GAG',  where  G 
is  the  least  fixed-point  of  I.  Conversely,  if  there  is  any  proof,  then  a  proof  can  be  constructed  using  G. 

A  proof  is  outlined  in  Appendix  B. 

The  inference  algorithm  is  thus: 

1.  Set  G  to  the  bottom  element  of  the  abstract  function  environment  lattice. 

2.  Iterate  (G,  a,  A' ,  s,  error)  =  7(G,0,p)  until  G  converges. 

3.  If  error  is  true,  then  report  that  p  is  erroneous. 

In  practice,  we  believe  that  a  checking  algorithm  based  on  the  language  extensions  of  Section  4.1  is  more  important, 
and  also  easier  to  implement.  We  discuss  our  implementation  of  such  a  system  in  Section  5.1. 


3.3  Examples 

We  conclude  with  example  applications  of  the  inference  rules  to  Figures  la  and  le.  Other  worked  examples  are 
included  in  Appendix  C  for  the  interested  reader.  The  functions  random ()  and  work!)  do  not  contain  barriers  or 
modify  visible  variables. 

Figure  la  fails  the  [If-Multi]  rule  -  the  alternatives  of  the  if  have  different  synchronization  sequences. 

0,0  b  random! )  :  4^  0,  e 
0,  0  b  barrier  :  +,  0,  b 

0,  0  b  0  :  +,  0,  e  [If-Multi] 

b  U  e  =  /  -f  f  The  rule  fails 

0,0  b  if  random!)  barrier  else  0  :? 


Figure  le  successfully  passes  the  inference  rules,  assuming  x  is  single- valued: 

0,  {x  :  +}  b  x  :  +,  {x  :  +},  e 

0,  {x  :  +}  b  barrier  :  +,  {x  :  +},  b 

_ 0,  {x  :  +}  b  work!)  :  {x  :  +},  e _ 

0,  {x  :  +}  b  if  (x)  barrier  else  work!)  :^{x:+],t®(i>Ut)  =  / 


[If-Single] 


4  Realistic  Languages 

We  now  turn  to  the  use  of  our  techniques  in  realistic  programming  languages.  Section  4.1  presents  features  we 
believe  every  SPMD  language  design  should  include.  Section  4.2  applies  barrier  inference  to  heterogeneous  parallel 
computing,  while  Section  4.3  discusses  modifications  needed  to  incorporate  our  techniques  in  programs  written  in 
C  or  FORTRAN-based  languages. 

4.1  SPMD  Language  Design 

Current  SPMD  languages  have  few  ways  of  indicating  the  synchronization  structure  of  an  application.  Even 
with  barrier  inference,  this  makes  SPMD  programs  unnecessarily  difficult  to  read  and  maintain.  We  propose 
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two  language  features  that  make  this  structure  more  explicit:  named  barriers  and  a  single  keyword  to  declare 
single- valued  variables  and  functions. 

Some  SPMD  languages  provide  named  barriers ,  with  the  semantics  that  a  runtime  error  results  if  processes  simul¬ 
taneously  execute  barriers  with  different  names.  Using  named  barriers  indicates  which  syntactic  barriers  may 
participate  in  a  synchronization.  Named  barriers  also  make  the  difference  between  [If-Multi]  and  [If-Single]  explicit: 
an  [If-Multi]  must  use  the  same  barrier  names  in  both  branches,  while  an  [If-Single]  should  use  different  names. 
Usually  named  barriers  are  implemented  using  a  broadcast  (so  the  names  can  be  compared)  which  is  much  slower 
than  special-purpose  barrier  hardware  (e.g.,  on  the  CM5  [17]  and  T3D  [4]).  But  C  already  effectively  has  two 
barrier  names:  barrier  and  broadcast.  Adding  more  names  increases  the  alphabet  of  synchronization  strings 
but  has  no  impact  on  inference  complexity.  Our  system  thus  allows  named  barriers  to  be  checked  at  compile-time, 
allowing  their  implementation  with  the  more  efficient  anonymous  barriers.  In  a  language  with  barrier  inference 
there  are  only  advantages  to  using  named  barriers. 

Our  inference  system  makes  clear  that  knowing  the  single- valued  variables  is  crucial  to  understanding  an  SPMD 
program’s  synchronization  structure.  We  believe  programmers  ought  to  declare  single-valued  variables,  formal 
parameters,  and  function  results.  These  declarations  are  checked  by  a  revised  inference  system.  We  propose  a 
keyword  single  used  as  a  type  modifier  (e.g.,  single  int  x;).  The  modifications  to  the  language  definition  are: 


Expr  ::= 

|  let  Decl  in  Expr 

|  letrec  Decl(Decl,  .  . . ,  Decl)  =  Expr  in  Expr 

Decl  ::=  id 

|  single  id 


Declaring  single- valuedness  has  two  advantages.  First,  the  program  is  clearer  as  the  common  parts  of  the  data-flow 
are  explicit.  Second,  barrier  inference  is  simplified.  Because  abstract  environments  can  be  built  directly  from 
single  declarations  instead  of  computed,  proofs 

B,  A  h  Expr  :  a,  s 

no  longer  need  a  result  environment.  Function  signatures 

(al  ,  •  •  •  ,  an)  — t  a,  s 

do  not  include  environments  either  and  can  also  be  built  from  the  declarations.  Figure  5  shows  the  new  inference 
rules. 


4.2  Heterogeneous  Computing 

An  interesting,  though  somewhat  speculative,  extension  is  to  verify  SPMD  programs  written  for  a  heterogeneous 
environment,  i.e.  an  environment  that  includes  computers  with  different  processor  architectures  and  hence  different 
data  formats.  A  new  problem  arises:  values  that  appear  single- valued  to  the  programmer  may  turn  out  to  be  slightly 
different  at  runtime  because  of  differences  between  the  architectures  involved.  For  instance,  they  may  be  using 
slightly  different  precisions  to  compute  intermediate  results  in  floating  point  computations.  Thus  the  innocuous 
loop 

t  =  0.0; 
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B ,  Ahr+,  6 


[Int] 


B,  A  h  id  :  A(id),  c 

B,  A  b  communicate  :  <tV  e 

B,  A  b  barrier  :  +,  b 

B,  A  b  broadcast  x  :  +,  r 
B,  A  b  Expr i  :  ai,  si 

B,  A  h  Exprn  :  an,  sn 

B{f)  =  (ai ;  •  •  •,<)  a,« 

VI  <  i  <  n.  ai  A  a7- 

B,  A  h  f  (Expr i ,  .  .  . ,  Exprn)  :  a,  si  ®  .  .  .  ®  sn  ®  s 


B,  A  h  Expr i  :  ai,  si 


_ B,  A  \~  Exprn  :«n,  _ 

B,  A  h  p  (Expri,  .  .  Exprn)  :  ai  U  .  .  .  U  a„,  si  ©  .  .  .  ©  sn 

B,  A  h  Expr  :  a,  s 
a  A  A(x) 

B,  A  h  *  Expr  :  a,  s 

B,  A[x  <—  a]  h  Expr  :  a7,  s 
B,  A  h  let  a  x  in  Expr  :  a7,  s 

5  =  (ai,  .  .  . ,  am)  ->■  a0,  s 

-^t/  S],  A  [a®  <-  ai,  .  .  . ,  xm  <-  am\  h  Expri  :  a0,  s 

B[/  <—  S\,A  h  Expr 2  :  a2,  s2 

B,  A  h  letrec  a o  f(a i  sq,  .  .  . ,  am  xm)  =  Expri  in  Exprn  :  a2,  s2 


B,  A  h  Expri  ■  +,  si 
B,  A  h  Expr2  :  a2,  s2 

_ B,  A  h  Exprn  :  «3,  s3 _ 

B,  A  h  i/  Expri  Expr2  else  Exprn  :  a2  U  03,  si  ®  (s2  U  S3) 

B,  A  h  Expri  ■  -<®  si 
B,  A  h  Expr2  :  a2,  s2 
B,  A  h  Exprn  :  a3,  s3 
s2  U  s3  ®  / 

'ix.A(x)  =  +  =y  *  ^  (AC (Expr2)  U  AC (Exprn)) 

B,  A  h  i/  Expri  Expr2  else  Exprn  ■  si  ®  (s2  U  S3) 

B,  A  h  Expri  ■  a  1,  si 

_ B,  A  h  Expr2  :  eg,  s2 _ 

B,  A  h  Expri 5  Expr2  :  a2,  si  ®  s2 

Figure  5:  Inference  rules  with  a  single  keyword. 
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[Id] 

[Comm] 

[Barrier] 

[Broadcast] 

[Fun] 

[Prim] 

[Assign] 

[Let] 

[LetRec] 

[If-Single] 

[If-Multi] 

[Sequence] 


while  (t  <  final_t)  { 
t  +=  dl  *  d2; 

barrier ( ) ; 

> 

might  be  executed  a  different  number  of  times  on  different  processors,  even  if  they  all  have  the  same  values  for  dl, 

d2  and  final_t. 

Another  problem  in  a  heterogeneous  environment  is  that  different  compilers  are  used  to  produce  the  executables. 
Thus  any  implementation-defined  characteristics,  such  as  order  of  evaluation  in  C,  may  vary.  This  could  easily 
cause  problems  in  code  like: 

a  =  f ()  +  g() ; 

where  both  f  and  g  use  barriers. 

Our  system  can  easily  detect  the  former  problem  by  supplying  appropriate  abstract  signatures  for  primitive  func¬ 
tions,  reflecting  whether  those  primitives  are  guaranteed  to  produce  the  same  value  in  all  processes.  The  second 
issue  can  be  checked  for  by  requiring  that  any  set  of  statements  whose  order  of  evaluation  is  undefined  have  a 
synchronization  sequence  of  e,  and  that  none  of  these  statements  modify  any  single- valued  variables. 

4.3  Application  to  Existing  Languages 

Some  features  of  C  and  FORTRAN,  which  are  popular  starting  points  for  SPMD  languages,  complicate  barrier 
inference.  Unstructured  control-flow,  aliasing,  function  pointers,  and  unitialised  data  structures  are  problematic. 
In  this  section  we  discuss  how  these  language  features  can  be  handled.  We  have  also  extended  these  concepts  to 
handle  object-oriented  programming  and  exception  handling,  but  we  do  not  report  on  this  work  here  for  lack  of 
space. 


4.3.1  Unstructured  Control-Flow 

Supporting  unstructured  control-flow  (i.e. ,  goto)  requires  the  replacement  of  the  [If-Single]  and  [If-Multi]  rules 
by  more  complex  mechanisms,  though  the  inter-procedural  aspects  of  the  inference  system  remain  unchanged. 
The  problem  can  be  divided  into  three  parts:  finding  the  single-valued  variables,  computing  the  synchronization 
sequence  of  a  function,  and  verifying  multi-valued  branches  do  not  cause  synchronization  problems. 

The  inference  of  single-valued  variables  is  very  similar  to  the  problem  of  binding-time  analysis  in  partial  evalua¬ 
tion  [12]:  Given  a  set  of  variables  whose  value  is  assumed  known  (or  single- valued  in  our  case),  determine  which 
expressions  and  variables  have  a  value  that  depends  solely  on  these  variables.  The  following  algorithm  is  similar 
to  [1],  a  binding-time  analysis  for  C. 

Finding  single- valued  variables 

Outline:  To  find  the  single-valued  variables  of  a  function  /: 

1.  Build  the  static,  single-assignment  (SSA)  form  [6]  for  function  /.  This  has  two  advantages: 

(a)  Each  SSA  variable  is  either  single- valued  or  not.  The  status  of  variables  at  particular  statements  of  the 
function  is  of  no  concern. 

(b)  The  points  where  different  values  of  variables  merge  are  explicit. 
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2.  Build  the  branch  dependences  for  each  statement,  i.e.  the  list  of  branch  outcomes  that  determine  whether  a 
statement  is  executed.  The  branch  dependences  are  computed  from  the  control-dependence  relation. 

3.  From  the  branch  dependences  associated  with  each  assignment,  determine  for  each  ^-function  in  the  SSA  form 
which  branches  must  be  taken  in  a  single- valued  fashion  for  the  value  of  the  ^-function  to  be  single- valued. 

4.  Determine  for  each  SSA  variable  v  its  dependence  set :  All  the  variables  that  must  be  single-valued  for  v  to 
be  single- valued. 

5.  Build  &  solve  a  set  of  constraints  whose  solution  gives  the  single- valued  variables. 


Appendix  E  details  each  step  of  this  algorithm. 

Computing  the  synchronization  sequence  is  straightforward  given  a  control-flow  graph  for  a  function:  The  abstract 
synchronization  sequence  from  node  n  is  defined  to  be  the  synchronization  sequence  executed  from  n  to  the  function’s 
exit-point.  This  sequence  respects  the  control-flow  equation: 

syncseq(n)  =  local-syncseq(n)  ®  | _ |  syncseq(s) 

sEsucc(n) 


where  local-syncseq(n)  is  the  abstract  synchronization  sequence  executed  at  node  n.  The  value  of  syncseq(n)  can 
be  found  by  fixed-point  iteration. 

The  final  step  is  to  verify  that  all  the  branches  in  the  function  are  either  single-valued  (and  correspond  to  [If- 
Single])  or  that  they  obey  the  same  restriction  as  the  [If-Multi]  rule,  i.e.  both  paths  have  executed  the  same  explicit 
synchronization  sequence  when  they  rejoin.  The  verification  proceeds  as  follows  for  each  multi-valued  branch  b  of 
the  control-flow  graph: 

•  If  &  is  branch-dependent  on  itself  then  it  must  form  part  of  a  loop.  This  loop  cannot  contain  any  synchro¬ 
nization  statements,  so  local-syncseq(n)  must  be  e  for  all  statements  branch-dependent  on  b. 

•  Otherwise,  b  controls  an  if-like  statement,  and  both  paths  must  execute  the  same,  known,  synchronization 
statement.  This  is  verified  by  computing  the  syncseq  function  defined  above,  restricted  to  b  and  all  statements 
branch-dependent  on  b.  The  values  of  local-syncseq(n)  for  all  other  nodes  n  are  temporarily  considered  to  be 
_L.  If  syncseq(&)  =  /  then  branch  b  is  invalid. 


4.3.2  Other  Language  Features 

The  other  language  features  mentioned  above  do  not  require  such  complex  changes.  In  the  presence  of  pointer 
values,  detecting  single- valued  variables  can  require  alias  analysis,  a  well-known  hard  problem  [15].  We  have  found 
that  very  conservative  assumptions  suffice  in  practice  (see  Section  5.1):  a  variable  whose  address  is  taken  is  multi¬ 
valued;  any  pointer  dereference  is  multi-valued.  Similar  problems  arise  with  function  pointers,  so  we  require  that 
all  functions  whose  address  is  computed  have  synchronization  sequence  e,  and  we  require  that  all  visible  variables 
they  assign  are  multi-valued. 

When  a  data  structure  is  initialised  with  a  single-valued  expression  at  creation,  it  remains  single-valued  so  long 
as  all  modifications  are  single- valued.  Without  initialization,  detecting  when  all  elements  of  a  data  structure  are 
single-valued  is  much  harder.  Therefore  we  mark  uninitialized  data  structures  as  multi-valued. 

In  practice  we  have  found  that  pointers  and  complex  data  structures  are  rarely  used  in  conjunction  with  syn¬ 
chronization.  There  are  a  few  exceptions;  in  particular,  in  C  programs  command-line  arguments  are  single-valued 
pointers  and  strings  in  argv.  Many  programs  parse  argv  to  initialize  some  single-valued  variables.  For  these 
situations  a  mechanism  is  needed  for  the  programmer  to  assert  that  a  particular  expression  is  single- valued.  In  the 
tradition  of  C,  we  call  this  a  single-valued  cast.  Use  of  this  feature  should  of  course  be  minimized. 
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4.3.3  Single  keyword  in  C 


The  single  keyword  proposed  in  Section  4.1  can  be  added  to  a  C-based  SPMD  language.  This  keyword  is  a  type 
qualifier,  like  const  or  volatile,  that  can  be  applied  to  any  part  of  a  type. 

There  is  however  one  important  restriction:  if  any  component  of  a  type  t  is  declared  single,  then  t  must  be 
implicitly  considered  to  be  single  also.  There  are  several  reasons  for  this  situation:  first,  a  type  such  as  “pointer 
to  single  int”  is  not  useful,  as  the  results  of  dereferencing  it  are  not  single- valued,  and  modifications  made  via  such 
a  pointer  would  violate  the  single-valuedness  of  the  object  pointed  to.  Secondly,  a  struct  with  a  single  held  must 
obey  the  single  restrictions  when  used  as  the  destination  of  an  assignment.  The  name  equivalence  used  by  C  for 
struct  types  means  that  it  is  not  possible  to  copy  a  structure  with  a  single  held  to  a  structure  that  is  identical 
except  that  that  held  is  not  single.  Finally,  it  is  not  possible  to  copy  arrays.  Hence  there  are  no  useful  types  with 
a  single  component  that  are  not  themselves  single. 

We  assume  that  all  pointers  are  local  and  non-communicable:  single  is  used  to  denote  store  whose  computation 
is  replicated  across  all  processors,  a  remote  pointer  to  single  storage  would  allow  this  assumption  to  be  violated. 
In  a  language  that  has  remote  pointers,  the  type  referred  to  must  not  have  any  single  components. 

C  has  only  integer  variables,  so  all  the  copies  of  a  single  variable  have  equal  values.  When  only  some  helds  of  a 
variable  are  single- valued  it  is  inappropriate  to  talk  of  equality.  Instead,  we  say  that  two  variables  are  consistent  if 
they  agree  on  the  values  of  those  parts  that  are  declared  single.  Formally,  we  say  that  two  values  of  a  type  t  are 
consistent  if  t  is  not  single  or: 

•  t  is  a  base  type  and  the  values  are  equal  (this  is  the  only  case  addressed  in  C). 

•  t  is  an  array  and  all  corresponding  elements  are  consistent. 

•  t  is  a  struct  type  and  all  single  helds  are  consistent. 

•  t  is  an  union  type  and  the  last  assigned  held  is  the  same  in  both  unions,  and  the  values  of  that  held  are 
consistent. 

•  t  is  a  function  pointer  and  both  values  are  null  or  point  to  the  same  function. 

•  t  is  a  pointer  and  both  values  are  null  or  refer  to  an  object  of  the  same  size,  these  objects  are  consistent,  and 
both  pointers  are  at  the  same  offset  in  this  object. 

The  checking  rules  of  Figure  5  extend  naturally  in  this  context.  The  C  relation  is  replaced  by  the  general  rule  that 
type  single  ICC  Casts  involving  single  are  allowed,  but  are  unchecked.  Similarly,  there  can  be  no  check  that 
only  the  last  assigned  held  of  an  union  is  read.  Finally,  all  variables  declared  single  must  be  initialised  by  single 
values,  to  guarantee  that  such  variables  are  initially  consistent  across  processes. 


5  Experiments 

We  implemented  a  prototype  of  our  inference  system  for  Split-C  [5],  an  explicitly  parallel  extension  to  C.  We  tested 
our  prototype  on  Split-C  kernels  and  applications.  The  empirical  question  we  sought  to  answer  is:  How  well  does 
barrier  inference  integrate  with  real  SPMD  programming  ?  Our  measure  is  the  number  of  changes  to  preexisting 
programs  required  to  conform  to  our  system.  The  results  were  promising:  the  checks  were  all  successful  with 
minor  changes,  except  for  the  exception  handling  aspects  of  one  application.  We  also  hand-examined  the  Splash-2 
benchmarks  and  found  that  all  but  one  would  be  checkable  with  our  approach. 
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5.1  Split-C  Prototype 


For  our  purposes,  the  important  features  of  Split-C  are  the  barrier  ( )  and  all_bcast  ( )  functions,  which  correspond 
to  the  barrier  and  broadcast  primitives  of  C. 

The  prototype  is  a  cross  between  a  pure  inference  system  and  the  language  extensions  proposed  in  Section  4.1:  It 
relies  on  a  specification  of  the  signatures  of  the  functions  and  a  list  of  the  single- valued  global  variables,  but  infers 
the  single- valued  local  variables.  It  verifies  that  all  specifications  are  correct. 

Our  implementation  follows  the  guidelines  outlined  in  Section  4.3  for  supporting  C,  except  that  we  have  not  yet 
implemented  the  analysis  of  data  structures  (which  was  only  needed  by  one  of  the  Split-C  programs).  The  algorithm 
for  inferring  single- valued  variables  is  similar  to  [1], 

Table  5.1  presents  the  programs  and  summarizes  the  results  of  the  checking  process.  The  second  column  counts 
the  static  occurrences  of  barriers  in  the  program,  while  the  third  column  reports  the  number  of  branches  that 
controlled  the  execution  of  a  barrier  and  whose  condition  was  single- valued.  The  function  signature  and  single¬ 
valued  globals  columns  report  the  number  of  annotations  that  were  necessary  to  check  the  program.  The  cases 
that  required  modifications  to  the  code  are  summarized  in  the  ‘single-valued  casts’  and  ‘other  changes’  columns. 
Except  for  ‘svd’,  all  the  casts  are  for  values  computed  by  parsing  the  program’s  arguments  (see  Section  4.3).  The 
‘svd’  algorithm  uses  single- valued  arrays  (not  supported  by  our  prototype),  this  accounts  for  18  of  the  19  casts. 
The  last  cast  is  due  to  a  single- valued  result  being  returned  by  reference,  in  C  this  implies  taking  the  address  of  a 
variable:  our  system  assumes  that  any  variable  whose  address  is  taken  is  not  single- valued. 

The  ‘barnes’  application  includes  exception  handling  (via  setjmp),  which  is  unchecked  by  our  system4.  This  appli¬ 
cation  also  required  one  small,  local  change:  It  broadcasts  values  without  using  the  Split-C  broadcast  primitives; 
we  replaced  this  code  with  explicit  broadcasts.  One-line  changes  were  needed  in  three  programs,  ‘mm’,  ‘wator’  and 
‘nbody’.  In  these  programs  it  was  necessary  to  avoid  taking  the  address  of  single-valued  variables  which  were  read 
with  scant .  The  second  change  in  ‘nbody’  was  to  correct  a  minor  bug  detected  by  our  prototype:  when  unexpected 
arguments  were  supplied  only  some  processes  exited. 

These  results  show  that  our  system  is  successful  in  verifying  existing  Split-C  applications,  with  few  changes  and 
annotations.  All  but  one  of  the  programs  depend  on  single-valued  branches,  which  implies  that  conditional  syn¬ 
chronization  is  the  rule  and  not  the  exception  in  SPMD  programs,  and  therefore  that  analysis  of  single-valued 
variables  is  necessary.  The  analysis  time  is  low  enough  that  our  system  can  be  integrated  into  an  existing  compiler 
without  a  large  impact  on  execution  time  (the  times,  measured  on  an  HP  715/80,  represent  the  time  spent  in  our 
system,  they  do  not  include  the  time  to  build  the  standard  SSA  representation  used  by  our  prototype). 

5.2  The  SPLASH-2  Benchmarks 

As  a  further  validation  of  our  approach,  we  examined  the  synchronization  structure  of  the  SPLASH-2  bench¬ 
marks  [25],  which  are  written  in  C  extended  with  macros  for  writing  parallel  programs.  The  facilities  provided  by 
the  macros  include  named  barrier  synchronization.  Process  management  is  with  a  fork/join  model,  but  all  but  one 
of  the  programs  are  effectively  written  in  an  SPMD  style  with  all  processes  executing  the  same  code  (except  for 
initialization).  The  exception  is  the  ‘radiosity’  application;  as  it  is  outside  our  model  we  cannot  check  it. 

Our  implementation  is  written  for  Split-C  and  therefore  does  not  check  the  SPLASH-2  programs.  We  examined 
the  SPLASH-2  programs  by  hand  to  see  if  a  suitably  modified  system  would  be  able  to  check  these  programs. 
The  results  of  this  examination  are  given  in  Table  2.  The  four  kernels  and  all  but  one  of  the  applications  pose  no 
particular  problems  for  our  inference  system. 

4  Checking  use  of  setjmp  and  longjmp  in  C  is  almost  impossible  in  any  program  analysis.  In  the  ‘barnes’  application,  when  an 
exception  arises  in  one  process,  the  whole  program  is  terminated. 
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Program 

Lines 

Number  of 
barriers 

Single- valued 
branches 

Function 

signatures 

Single- valued 
globals 

Single- valued 
casts 

Other 

changes 

Analysis 

time 

cannon 

501 

17 

1 

1 

- 

- 

- 

0.3s 

eg 

453 

18 

2 

3 

- 

- 

- 

0.1s 

cholesky 

1542 

38 

16 

4 

- 

2 

- 

2.3s 

column 

651 

7 

3 

1 

- 

- 

- 

0.1s 

fft3d 

1181 

12 

5 

1 

- 

1 

- 

0.1s 

mm 

508 

23 

1 

1 

- 

- 

1 

0.2s 

radix 

379 

7 

3 

- 

- 

2 

- 

0.1s 

sample 

302 

9 

0 

- 

- 

- 

- 

0.1s 

svd 

1395 

1 

23 

13 

9 

19  (or  l)a 

- 

0.2s 

wator 

348 

10 

5 

- 

3 

- 

2 

0.1s 

nbody 

546 

7 

6 

- 

2 

3 

2 

0.3s 

em3d 

1080 

16 

1 

- 

- 

- 

- 

0.3s 

barnes 

2804 

73 

17 

2 

6 

7 

2 

0.6s 

°18  of  the  19  casts  are  required  because  of  the  lack  of  support  of  single- valued  arrays. 

Kernels: 

•  column,  sample,  radix:  Sorting  programs. 

•  cannon:  Matrix  multiplication  using  Cannons  algorithm. 

•  eg:  Solves  a  set  of  equations  using  the  conjugent  gradient  method. 

•  cholesky:  Seven  different  implementations  of  Cholesky  decomposition. 

•  fft3d:  A  3-dimensional  fast  fourier  transform. 

•  mm:  Matrix-multiply,  blocked  or  unblocked. 

•  svd:  Singular-value  decomposition,  using  the  Lanczos  algorithm. 

Applications: 

•  wator:  Simulation  of  particle-like  fish  under  current. 

•  nbody:  A  simple  n  body  simulation  code. 

•  em3d:  3-dimensional  electro-magnetic  simulation,  described  in  [13]. 

•  barnes:  Simulate  the  interaction  of  a  system  of  n  bodies  using  the  Barnes-Hut  hierarchical  method. 

Table  1:  Results  of  checking  Split-C  programs 

6  Related  Work 

There  are  four  strands  of  related  work:  SIMD  (Single  Instruction,  Multiple  Data)  languages,  synchronization 
analysis,  binding-time  analysis,  and  effect  systems. 

SIMD  Languages  divide  variables  into  control  unit  and  processing  unit  variables.  Control  unit  variables  resemble 
our  single- valued  variables:  they  are  variables  that  have  only  one  value.  Unlike  single- valued  variables,  control  unit 
variables  are  stored  in  only  one  location.  Control  unit  variables  are  declared  with  a  CU  keyword  in  the  Illiac  IV 
programming  language  Glypnir  [16].  The  Connection  Machine  language  C*  [23]  calls  these  variables  scalar.  There 
is  no  equivalent  of  our  inference  system  for  these  languages,  as  the  properties  we  are  inferring  are  guaranteed  by 
SIMD  semantics.  Our  proposed  single  keyword  provides  similar  advantages  for  SPMD  languages. 

The  ELP  language  [21]  [24],  a  joint  SIMD/SPMD  programming  language  where  both  “modes”  have  the  same 
semantics,  allows  declaration  of  single-valued  variables  with  a  mono  keyword.  When  in  SPMD  mode  the  compiler 
guarantees  that  the  single- valued  property  is  preserved,  presumably  using  rules  similar  to  ours  (the  paper  does  not 
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Program 

Lines 

Number  of  barriers 

Can  be  checked 

ocean 

2954 

19 

yes  -  needs  single- valued  array 

4703 

20 

inference  (both  versions) 

barnes 

2078 

6 

yes 

fmm 

3800 

13 

yes 

radiosity 

11319 

5 

no  -  not  pure  SPMD 

raytrace 

10020 

1 

yes 

water 

1744 

9 

yes 

2971 

9 

(both  versions) 

volrend 

3704 

13 

yes 

kernel  cholesky 

5050 

4 

yes 

kernel  fft 

1005 

7 

yes 

kernel  lu 

988 

5 

yes 

763 

5 

(both  versions) 

kernel  radix 

879 

7 

yes 

Table  2:  Results  of  examining  the  SPLASH-2  benchmarks 


give  many  details  on  the  checking  strategy).  ELP  does  not  include  explicit  barriers  or  language-level  broadcast,  so 
there  is  no  equivalent  to  our  verification  of  synchronization.  The  programming  model  is  also  very  different. 

Analysis  of  the  synchronization  of  parallel  programs  has  been  extensively  studied  for  the  purposes  of  deadlock  and 
data-race  detection  as  well  as  for  optimisation.  Our  survey  of  this  work  is  necessarily  partial,  and  covers  only  static 
techniques. 

Jeremiassen  and  Eggers  [11]  analyse  barrier  synchronization  for  SPMD  programs  to  improve  the  precision  of  optimi¬ 
sation.  They  do  not  attempt  to  verify  the  correctness  of  the  synchronization.  Their  analysis  relies  on  named  barriers 
for  precision  and  does  not  consider  single- valued  variables,  though  they  do  consider  dependencies  on  multi-valued 
constants  like  pid  [10]. 

A  number  of  papers  analyse  2- way  synchronization,  such  as  post/wait  or  the  accept /call  mechanism  of  Ada,  between 
explicitly  specified  tasks.  As  each  task  is  specified  with  different  code,  there  is  no  real  analogue  of  single-valued 
variables.  Analyzing  synchronization  in  this  context  is  similar  to  analyzing  the  synchronization  between  the  two 
branches  in  the  [If-Multi]  case,  for  which  we  only  allow  very  simple  synchronization  sequences.  None  of  the  following 
papers  present  exact  solutions  for  more  general  situations. 

One  technique  is  to  build  a  concurrency  graph  where  nodes  represent  parallel  program  states,  and  edges  represent 
synchronization  or  other  state  modifications.  Taylor  [22]  considers  only  control-flow  and  the  resulting  graph  can 
be  exponential  in  the  number  of  tasks.  Young  and  Taylor  [26]  attempt  to  increase  the  precision  of  the  concurrency 
graph  by  employing  symbolic  execution.  Helmbold  and  McDowell  [9]  and  McDowell  [19]  include  data  values  in  the 
concurrency  states,  and  discuss  a  number  of  techniques  for  reducing  the  number  of  nodes. 

A  different  approach  is  to  determine  which  statements  are  executed  before  others,  based  on  the  synchronization 
statements.  Callahan  and  Subhlok  [2]  and  Callahan,  Kennedy  and  Subhlok  [3]  compute  an  approximation  of  this 
relation  and  extend  it  with  dependence  distance  information  for  loops.  Masticola  and  Ryder  [18]  employ  this 
information,  along  with  other  techniques,  to  compute  a  “can’t  happen  together”  relation  for  statements. 

As  mentioned  in  Section  4.3,  inference  of  single- valued  variables  is  similar  to  binding-time  analysis  [12].  The  main 
difference  is  that  we  do  not  require  that  these  values  be  directly  computable  from  the  initial  set.  Our  single- valued 
variable  inference  algorithm  is  close  to  that  presented  by  Auslander  et  al  [1].  There  is  a  difference  in  the  handling 
of  control-flow  dependencies,  and  of  course  the  purpose  is  unrelated. 

Barrier  Inference  is  an  example  of  an  effect  system  [7],  where  the  effects  are  synchronization  sequences  and  the  type 
of  a  variable  represents  its  single- valuedness. 


20 


7  Conclusion 


We  have  identified  an  important  property  of  SPMD  programs  that  current  languages  do  not  explicitly  support: 
The  portion  of  control  and  data  flow  governing  global  synchronization  is  identical  across  all  the  processes.  This 
synchronization  kernel  structures  the  entire  application.  We  have  developed  an  inference  system  that  both  detects 
this  structure  and  verifies  that  global  synchronization  is  correct.  An  implementation  of  this  system  for  Split-C 
successfully  checks  a  number  of  programs. 

The  synchronization  kernel  is  sufficiently  important  that  it  should  be  explicitly  visible  in  source  code.  We  propose 
language  features  that  make  SPMD  programs  clearer  and  easier  to  check. 

We  are  integrating  these  language  extensions  into  a  successor  of  Split-C  based  on  Java  [8],  Titanium.  This  requires 
extending  the  application  of  the  single-valued  concept  to  more  complex  data  structures,  including  references  and 
objects,  and  to  handle  language  features  such  as  exception  handling.  We  are  also  working  on  an  algorithm  that 
uses  the  results  of  our  inference  system  to  represent  the  portions  of  the  code  that  may  be  executing  simultaneously 
so  that  SPMD  optimisations,  e.g.  [14],  may  be  more  precise. 
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A  Soundness 


A.l  Proof  of  Lemma  3.2 

The  proof  uses  three  simple  lemmas.  The  first  lemma  states  that  all  the  continuations  introduced  by  the  rewrite 
rules  are  eventually  applied.  The  second  lemma  asserts  that  if  two  evaluations  have  identical  broadcast  sequences, 
and  that  a  prefix  of  both  evaluations  has  the  same  the  same  synchronization  sequence,  then  the  remaining  steps  of 
both  evaluations  have  the  same  broadcast  sequence.  Taken  together,  these  two  lemmas  allow  the  rewrite  rules  to 
be  broken  into  pieces  so  that  an  induction  on  the  length  of  a  rewrite  sequence  can  be  applied  to  all  evaluations. 


Lemma  A.l  Let  e  be  any  expression.  If  [(F,  E,  C,  e)]  [(F,  E,  C' ,  e')\  ^  [C(E',  i)]  then  [(F,  E,  C' ,  e')\  ^  [C'(E" ,  *')] , 

[C'(E",  i')\  ^  [C(E',  i)],  and  t  =  h  ©  t2. 

1 2 

Lemma  A. 2  If  the  broadcast  sequences  of 

[{F,  Ei,  Ci,  ei>]  f  [{F,  E[,C[,  ei>]  f  [{F,  E'{,  C?,  e'/>] 

t  1 1 

[(F,  E2,  C2,  e2)\  f  [{F,  E'2,  C2,  e'2>]  f  [{F,  E'l  C'l  e">] 

t  1 2 


are  identical,  then  the  broadcast  sequences  of 

[{F,E'i,C'i,e')]M{F,E'{,C'{,e»)] 
1 1 

[{F,E'2,C'2,e')]^  [<T,f?",^',e">] 


are  also  identical. 

The  third  lemma  says  that  if  e  does  not  assign  to  x  directly  or  via  a  function  call,  then  x’s  value  is  unchanged  by 
evaluation  of  e. 

Lemma  A. 3  Let  e  be  any  expression,  E  any  environment  and  F  the  free  functions  at  e.  If  [(T,  E,  C,  e}]  [C(E' ,  i)] 

then  Mx  £  dom(E).x  A V(e)  =y  E(x)  =  E'(x). 

The  proof  of  Lemma  3.2  proceeds  by  induction  on  the  length  of  the  rewrite  sequence  representing  the  evaluation 
of  e.  For  each  expression  e,  we  can  assume  that  B,  A  b  e  :  a,  A' ,  s,  that  E i  E2,  F  :  B ,  that 

[(F,Ei,Ci,e)}  ^  [Ci{E[,ii)] 

[{F,  E2,  C2,  e)]  t  [C2(E'2,i2)\ 

1 2 

and  that  the  broadcast  sequences  of  both  evaluations  are  identical.  We  must  show  that 

•  ti  =  t2  and  t2  A  s 

•  E'i  E2 

•  a  =  +  =y  ii  =  i2 

We  consider  each  rewrite  rule  in  turn. 
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•  i:  From  the  semantics  and  inference  rules,  we  know 


_£?,  A  \—  %  :  e 

[( F ,  Ei,Ci,i)]^  [Ci(Ei,i)] 

e 

[<C  i?2,e2,i)]^[e2(i?2,i)] 

e 

•  id:  From  the  semantics  and  inference  rules,  we  know 


B,  A  h  id  :  A(id),  A,  c 
[(F,  Ei,  C\,  id)]  ^  [C(Ei,  Ei(id))\ 

e 

[{F,  E2,C2,id)]^[C(E2,E2(id))] 
£ 


So  A(id)  =  +  =y  Ei(id)  =  E2(id). 

•  communicate:  From  the  semantics  and  inference  rules,  we  know 


B,  A  h  communicate  :  A,  c 

[(F1,  Ei,  Ci,  communicate }]  [C'i(Ei,  oracleQ)] 

£ 

[(F1,  E2,  C'2,  communicate }]  [C2(E2,  oracleQ)] 

£ 

The  two  different  calls  to  oracleQ  may  return  different  values,  but  a  = 

•  barrier:  From  the  semantics  and  inference  rules,  we  know 


B,  A  b  barrier  :+,  A,  b 
[(F1,  Ei,  Ci,  communicate }]  0)] 

b 

[(F1,  E2,  C2,  communicate }]  [C'2(E2, 0)] 

b 

•  broadcast:  From  the  semantics,  inference  rules  and  the  fact  that  the  broadcast  sequences  of  both  evaluations 
are  identical,  we  know 

B,  A  b  broadcast  :+,  A,  r 
[(F1,  Ei,  Ci,  broadcast }]  x)] 

r 

[(F1,  E2,  C2,  broadcast )]  [C2(E2,  x)\ 

r 

where  x  =  oracleQ. 

•  *  <—  Expr:  From  the  semantics,  by  hypothesis  and  Lemmas  A.l  and  A. 2 

[(F1,  Ei,  C\,x  <-  Expr)]  - — >  [(F1,  Ei,  C[,Expr)]  ^  [C[(E[,ii)\  =  [Ci(E[[x  <-  ii],ii)] 

[(F1,  E2,C2,x  Expr)]  ^  [(F1,  E2,  C2,  Expr)]  y,  [C2{E2,  i2)]  =  [C2(E2[x  <-  i2],i2)] 

£  1 2 

with  C[  =  A (E1 ,  v).Ci(E'[x  v],  v)  and  C2  =  A (E1 ,  v).C2(E'[x  v],  v) 
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The  inference  rule  states 


B,  A  h  Expr  :  a,  A' ,  s 
B,  A  b  x  A-  Expr  :  a,  A'\x  A-  a],  s 

By  induction,  we  know  that:  fi  =  t2,  t2  A  s,  and  a  =  +  =>  ii  =  12-  It  follows  that  E[[x  4— 

*l]  ~A'[a:<-a]  -E^f*  *2]  - 

•  let  x  in  Expr:  From  the  semantics,  by  hypothesis  and  Lemmas  A.l  and  A. 2 

[(E1,  -Ei,  Ci,  let  x  in  Expr)]  - — >  [(E1,  Ei[x  A-  0],  C[,  Expr)]'^[C'[(E[,ii)]  =  [Ci(E[/ / {x},  ii)] 

e  1 1 

[(E1,  E2,  C2,let  x  in  Expr)]  - — >  [(E1,  E2[x  A-  C\,C2,  Expr)]'^[C"2(E2,i2)]  =  [C2(E'2/ / {x} ,  i2)] 

€  1 2 

with  C[  =  X (E',  v).C1(E'//{x},  v)  and  C2  =  A (E' ,  v).C2(E' / /{x},  v) 

The  inference  rule  states 

B,  A[x  < - 1-]  h  Expr  :  a,  A' ,  s 

B,  A  h  let  x  in  Expr  :  a,  A'  / /{x~],  s 

By  induction,  we  know  that:  t\  =  t2,  t2  A  s,  E[  &a'  E2  and  a  =  +  =y  i\  =  i2.  It  follows  that 
E[//{x}  &a<//{x}  E2//{x}. 

•  Expr  1;  Expr2:  From  the  semantics,  by  hypothesis  and  Lemmas  A.l  and  A. 2 


[(E1,  Ei,  C\,  Expri;  Expr2)]  ^  [{F,  Ei,  C[ ,  Expri)]  '^t[Cl1{E[,ii)\  =  [(E1,  E[,  G\,  Expr2)]  'C  [Ci(E’{,  ji)] 

£  f1  t 1 

61  C2 

[(F,E2,C2,  Expri;  Expr2)]  ^[{F,  E2,  C2,  Expn)]  [C2(E'2,i2 )]  =  [(E1,  E2,  C2,  Expr2)]  [C2(E2  ,  j2)] 

£  t 2  t 2 

with  C'i  =  A (E' ,  v).(F,  E' ,  C 1,  Expr2)  and  C2  =  A (E' ,  v).(F,  E' ,  C2,  Expr2) 

The  inference  rule  states 

B,  Ao  h  Expri  :  a  1,  Ai,  si 
B,  A 1  h  Expr2  :  a2,  A2,  s2 
B,  A0  b  Expri 5  Expr2  :  a2,  A2,  si  ©  S2 

Applying  the  induction  hypothesis  to  Expri ,  we  know  that:  t\  =  t\,  tf  A  A,  E[  E2.  The  last 

fact  completes  the  hypotheses  of  the  induction  for  Expr2  so  we  can  also  conclude  that:  t\  =  t2,  t2  A  s2, 
E'l  «2i2  E2  and  a2  =  +  =►  ji  =  j2.  So  t\  ®  t\  =  t\  ®  t\  and  t\  ®  t\  A  sx  ©  s2. 

•  f (Expri,  ■■■  7  Exprn):  The  arguments  are  evaluated  in  sequence,  so  the  same  inductive  reasoning  as  for 
Expri;  Expr2  gives 


[{F,  Ei, Cl,  f  {Expri,  ■  ..,Exprn)]~^[(F,Ei,C{,Expri)\^[C{(E\,v\)]  ^  ^  [C„(E(+1,  v()]  [C((E'(,  iq)] 


[{F,  E2,  Cl  f  (Expn,.  ■  ■ ,  Exprn)]  ^  [{F,  E2,  Cl  Expri)]  ^  [^(Ef,  «?)]  ^ 

e  t\  t22 

with  F(f)  =  /  (x'i ,,xn)  =  Expr 


■tl[Cl(El+i,vl)]^[ClE'lv2)] 


and  C(  =  A(E2,  v).{F,  E  2,  C\,Expr2), . .  .,C\  =  X(En+i,vn).{F\FF(f),  E(,  C[,  Expr) 
E'O  =  (En  + 1  \EV(f))[xi  4—  Vi,  . . .  ,xn  A-  vn] 

C'i  =  XE' ,  v.C(  ((En+i//FV(f)  +  E'//{xi, ...,  xn}),v) 
and  C\  =  X(E2,  v).{F,  E2,  Cl  Expr2),  ...,€/(  =  X(En+i,  vn).(F\FF(f),  E'l  C2,  Expr) 
E'O  =  (En  + 1  \EV(f))[xi  4—  Vi,  . . .  ,xn  A-  vn] 

C"2  =  XE',  v.C(  ((En+i//FV(f)  +  E’H{xi, ...,  xn}),v) 
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The  inference  rule  states 


B,  Aq  h  Expr i  :  a i,  Ai,  si 


B,  An  —  i  \~  Exprn  .  cin ,  An , 

5(/)  =  K, .  A  ->  a,A',s 

An  \dom(A)  A  A 

VI  <  i  <  n.  a,i  A  a'- 

B,  A0  h  f  [Expr i ,  .  .  . ,  Exprn)  :  a,  Anj /dom(A')  +  A',  si  ®  .  .  .  ®  sn  ®  s 


Applying  the  induction  hypothesis  n  times,  we  conclude:  t\  =  f?,  V  s8,  a8-  =  +  =y  vj  =  v? ,  E^+1  K,An  -E'n+i- 

We  also  know  that  F  :  B,  i.e.  B\FF(f),  A[x i  a'1;  .  .  . ,  xn  aJJ  h  Expr  :  a,  A",  s  and  A!  =  A" / / {x i,  .  . . ,  *„}. 
As  all  function  names  in  L  are  unique,  F\FF(f)  :  B\FF(f).  From  above 

[CM+1,0]  =  =  [C^Kai)] 

[^(^+1,^)]  =  [(i?|^(/),i?o2,G',A^r)]^[C7'(A',t,2)]  =  [Cl{E’’,v2)] 

60 

with  A"  =  E^J/FVif)  +  ^//{an,  and  E”  =  E2n+1//FV(f)  +  £'//{*!,  ...,xn] 

The  hypothesis  of  the  induction  is  thus  verified,  so  =  t2,  t2  A  s,  a  =  +  =y  iq  =  v2,  E[  ~a"  E2. 

As 


—  t \  ®  .  .  .  ®  f  J  —  t2  ®  .  .  .  tl  ®  tl  A  si  ®  .  .  .  ®  sn  ®  s 

-  »  =  +  =>  »1  =  «2 

—  By  definition  of  the  abstract  function  environment  B,  FV(f)  =  dom(A)  =  dom(A').  So,  given  that 
E[  &A"  E2,  E^+1  &An  E2+ 1,  it  follows  that  E'{  « An//dom(A')+A <  E'^. 

the  lemma  is  verified  for  this  case. 

•  p(Expri,  .  .  .  ,  Exprn):  The  arguments  are  evaluated  in  sequence,  so  the  same  inductive  reasoning  as  above 
gives 


[{F,  Ei)Cl,p(Expri) . . . ,  Exprn)\  ^  [{F,  E1;Cl,  ExprA)]  ^  . 

e  tl 

[(F,  E2,C2,p(Expri, . . . ,  Exprn)\  ^  [(F,  E2,Cf,  Expn)]  ^  . 

e  tl 

with  C{  =  \{E2,  v).(F,  E2,  C 2,  Expr2), . . . ,  Cl  = 
and  Cl  =  X(E2,  v).(F,  E2,  Cl,  Expr2), . . . ,  C%  = 

The  inference  rule  states 


•  •  [CkiEl^vl)]  =  [Cl(El+1,p(v{, .  ..yn))] 
■  ■  [Cl{E2n+1,v2n)]  =  [C2(E2n+1,p(v2, ^))] 

tn 

\{En+i,vn).Cl{En+i,p{vi, . .  .,vn)) 
\{En+i,vn).Cl{En+i,p{vi, . .  .,vn)) 


B,  Aq  h  Expr i  :  cq,  Ai,  si 


B,  An  —  i  \~  Exprn  .  cin ,  An,  -sn 

B,  A0  b  p  (Expri,  .  .  . ,  Exprn)  :  ai  U  .  .  .  U  an,  An,  si  ®  .  .  .  ®  sn 

By  induction,  tj  =  t?,  tl  A  Si,  cp  =  +  =y  vj  =  vf ,  E^+1  K,An  E^+ 1.  As  the  value  of  p  depends  only  on  its 
arguments,  ai  U  .  .  .  U  an  =  +  p{v\,  .  .  . ,  v^)  =  p(vf,  .  .  ,,v2). 
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•  letrec  f(x .  .  .  ,  xn)  =  Expri  in  Expr2:  From  the  semantics,  by  hypothesis  and  Lemmas  A.l  and  A. 2 

[{F,  Ei,C\,  letrec  f(xi,  ...,xn)  =  Expri  in  Expr2)\  ~^[{G,  E1;Ci,  Expr2)]  '^[G\(E[,v1)\ 

e  1 1 

[(-F,  E2,  C2,  letrec  f(x1  ,...,xn)  =  Expri  in  Expr2)\  [(G,  E2,  C2,  Expr2)\  [C2(E'2,  v2)] 

e  1 2 

with  G  =  F[f  <-  f(xi,  ...,xn)  =  Expri ] 

The  inference  rule  states 

dom(A)  =  dom(A')  =  dom(Ao) 

S  =  (ai, ,  am),  A  — >  a,  A' ,  s 
A'  =  A"//{xi,  ...,xn} 

B[f  S],  A[x i  <—  ai,  .  .  . ,  xn  <—  an\  b  Expri  '■  a,  A" ,  s 
B[f  <-  S'],  A0  b  Expr2  :  a'2,  A2,  s2 

B,  Ao  b  letrec  f(x i ,  .  .  . ,  xn)  =  Expri  in  Expr2  :  a2,  A 2,  s2 

So  G  :  B[f  S],  therefore  the  induction  hypothesis  applies,  and  G  =  t2,  t2  A  s2 ,  a'2  =  +  =y  Vi  =  v2, 
E'i  E'2. 

•  if  Expri  Expr2  else  Expr 3:  From  the  semantics,  by  hypothesis  and  Lemmas  A.l  and  A. 2 

[(F,  Ei,  Ci,  if  Expri  Expr2  else  Expr 3)]  ^  [(F1,  E,  Cl,  Expri)]  [Cl{E[ ,  ?®)]  [Ci(E'{,  i>i)] 

e  t1  t 1 

61  l2 

[(F,  E2,  C2,  if  Expri  Expr2  else  Exprf) ]  [(F,  E,  C$,  Expri)]  [Cq{E'2,  v2)]  [C2(E'f,  v'2)\ 

e  t\  t% 

with  Cq  =  A (E1 ,  v).(F,  E' ,  Gi,  if  v  =  0  then  Expr2  else  Exprf) 
and  Cq  =  A (E1 ,  v).(F,  E' ,  C2,  if  v  =  0  then  Expr2  else  Exprf) 

The  inference  rule  applied  to  this  construction  is  either  [If-Single]  or  [If-Multi].  If  the  rule  is  [If-Single] 


B,  A0  b  Expri  :  +,  Ai,  s  1 
B,  A 1  b  Expr2  :  a2,  A2,  s2 
B,  Ai  b  Expr3  :  a3,  A3,  s3 

B,  Ao  b  if  Expri  Expr2  else  Expr3  :  a2  U  a3,  A2  U  A3,  si  ®  (s2  LI  S3) 

By  induction,  Vi  =  v2,  t\  =  tf  A  si  and  E[  E2,  so  the  applications  of  Gq  and  Cq  return  states  that 
evaluate  the  same  expression.  If  vi  =  v2  =  0,  we  get 

[[Co1  (£1,0)]  =  [{F,  E'i,  Ci,  Expr 2)]  [Ci{E’{,  <)] 

[Cl(El2,  0)]  =  [{F,  E2,  G2,  Expr2)\  ^  [G2(G",  v'2)] 

t2 

l2 

The  hypothesis  of  the  induction  is  satisfied,  so  t\  =  t\  A  s2 ,  E”  E2  and  a2  =  +  =y  v[  =  v2. 

The  case  for  Vi  =  v2  yl  0  is  similar.  It  therefore  follows  that  t\  ®  t\  =  tf  ®  t2  A  si  ®  (s2  U  S3),  a2  U  a3  =  +  =y 
v'i  =  v2  and  E'{  ~y42uA3  E2,  so  the  lemma  is  verified  for  this  case. 

If  the  inference  rule  is  [If-Multi] 

B,  A0  b  Expri  ■  Ai,  s  1 
B,  Ai  b  Expr2  :  a2,  A2,  s2 
B,  Ai  b  Expr3  :  a3,  A3,  s3 
s2  U  s3  ®  / 

A'  =  Ai  <  ( AV(Expr2 )  U  AV(Expr3)) 

B,  Aq  b  if  Expri  Expr2  else  Expr3  :  -<G  A7,  si  ®  (s2  U  S3) 
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If  Vi  and  v2  are  both  equal  to  or  different  from  0,  then  the  [If-Multi]  behaves  like  the  [If-Single]  case,  and  the 
lemma  is  easily  verified  as  the  <  operator  only  weakens  the  requirements. 

Assuming,  with  no  loss  of  generality,  that  iq  =  0  and  v2  7^  0  the  lemma  is  applied  independently  to  each 
expression 


[CKElv,)]  =  [(F,  E[,G\,  Expr2)\  ^  [G\(E'' ,  «')] 

[Cl{E’2,  «2)]  =  [(F,  E[,G\,  Expr3)]  ^  [( 02(E ",  v’2)] 

t2 

l2 

By  induction,  we  get:  t\  A  s2  and  t2  E  s 3.  Also  t\  and  t2  are  strings  in  {b,  r}* ,  so  _L^;  t\  E  -s2,  _L^;  t2  E  s3, 
and  s2  U  s3  -<  /  t\  =  s2  =  s3  =  t\  =  s2  U  s3. 

We  have  E[  E2.  By  Lemma  A. 3 


Mx  (fi  AV{Expr2) .E[(x)  =  E"(x) 

Mx  AV (Expr?) .El2(x)  =  E2(x) 

=y  Mx  ( AV  ( Expr2 )  U  AV  ( Expr3)).E[(x )  =  E2(x)  =>  E”(x)  =  E2(x) 

~E  E2  A1<(AV  (Expr2)UAV  (Expr3 ))  E 2 

so  the  lemma  is  verified  for  this  case. 


A. 2  Proof  of  Theorem  3.3 

The  following  simple  lemma  asserts  that  if  an  evaluation  terminates,  then  the  special  continuation  I  must  have 
been  evaluated: 


Lemma  A. 4  Let  e  be  any  expression,  E  any  environment  and  F  any  function  environment.  If 

[(F,E,I,e)]^[(E',v)\ 

then 


[(F,E,I,e)]E>[I(E',v)\ 


To  prove  Theorem  3.3  we  must  show  that  given  an  expression  e,  a  proof  B,  A  b  e  :  a,A',s,  and  environments 
F,  Ei,  .  .  . ,  En  such  that  F  :  B  and  Ei  Ej  for  i,  j  =  l..n,  that: 

[{F,  Elt  I,  e), . . . ,  (F,  En,  I,  e>]  [(E[,v  1), . . . ,  (E'n,vn)} 


or  some  process  diverges. 

The  proof  is  simple.  Lemma  A. 4  and  the  assumption  that  no  process  diverges  implies  that  for  all  i 


[{F,E%,I,e)]r[I{E'%,vi)]  =  [{E[,vi)] 


We  assume,  with  no  loss  of  generality,  that  the  sequence  of  values  returned  by  broadcast  is  the  same  for  all 
evaluations.  It  then  follows  from  Lemma  3.2  that  t\  =  t2  =  .  .  .  =  tn. 

The  i  evaluation  sequences  can  therefore  be  combined  into  one  common  evaluation  sequence  as  they  all  have 
the  same  synchronization  sequence:  denoting  the  fc’th  element  of  t{  by  ,  each  individual  evaluation  sequence 
can  be  decomposed  as  (t*  used  as  an  expression  stands  for  the  synchronization  operation  corresponding  to  the 
synchronization  letter) 


As  tk  =  tk-  for  all  i,j,k  these  individual  evaluations  can  be  combined  into  a  global  evaluation  using  the  general 
interleaving  rule  for  individual  processes  and  the  barrier  and  broadcast  rules  as  follows 


[(F,  E  i,  (F,  En,  I,  e>]  [{F,  E\,  C\,t\),  E\,  Ci,t\ 


■  [C?(E?,v?), C™(E™,  <)]  [(E[,v  i), . . . ,  (E'n,vn)} 


This  completes  the  proof. 


29 


B  Implementation  Soundness 

Theorem  3.4  has  four  parts 

1.  The  global  abstract  function  environments  G  of  p  forms  a  lattice  of  finite  height. 

Proof:  There  are  only  a  finite  number  of  functions  in  a  program  and  each  component  of  a  function  signature 
is  a  lattice  of  finite  height. 

2.  I  is  monotonic  in  its  G  argument. 

Proof:  We  prove  that  all  the  results  of  I  are  monotonic  in  both  G  and  A.  The  proof  is  a  straightforward 
induction  on  the  structure  of  expressions.  In  particular,  ®  is  monotonic. 

3.  A  fixed  point  I(G,  0,p)  =  (G,  a,  A,  s,  false)  exists  iff  a  proof  can  be  built  with  the  inference  rules  of  Figure  3. 

Proof:  Given  a  proof  for  a  program  p,  an  environment  G  is  built  from  all  the  assumptions  about  signatures 
embodied  in  applications  of  the  [LetRec]  rule.  It  is  easy  to  verify  that  I(G,  0,p)  =  (G,  a,  A,  s,  false)  for  such 
a  G.  The  only  case  that  can  set  error  to  true  is  an  if  which  occurs  only  if  s2  LI  S3  =  /,  which  is  precluded 
by  the  existence  of  a  proof. 

To  prove  the  converse,  we  consider  a  slightly  less  restrictive  version  of  the  inference  rules  of  Figure  3:  we 
remove  the  s2  U  S3  ®  /  requirement  from  [If-Multi].  It  is  obvious  that  all  proofs  in  the  old  system  are  still 
valid  in  the  new  one. 

Given  a  fixed  point  G  of  I,  I(G,  0,p)  =  (G,  a,  A,  s,  error)  it  is  easy  to  build  a  proof  in  this  expanded  inference 
system:  The  requirements  of  the  [Fun]  rule  are  implied  by  G  being  a  fixed  point,  the  assumptions  needed  for 
[LetRec]  are  read  from  G.  There  is  thus  a  one-to-one  correspondence  between  proofs  in  the  expanded  systems 
and  fixed  points  of  I.  All  fixed  points  for  which  error  =  true  are  valid  in  the  new  system,  but  not  in  the  old, 
while  those  for  which  error  =  false  are  valid  in  both.  If  all  fixed  points  have  error  =  true,  it  will  not  be 
possible  to  build  a  proof  in  the  old  system.  Thus  a  fixed  point  with  error  =  false  exists  iff  a  proof  exists  in 
the  old  system. 

4.  Any  proof  built  from  the  inference  rules  defines  a  global  abstract  environment  G'  such  that  GAG',  where  G 
is  the  least  fixed-point  of  I.  A  proof  can  be  built  from  G  if  any  proof  exists. 

Proof:  From  point  3  it  follows  that  the  environment  G'  defined  by  any  proof  is  a  fixed  point  I(G',0,p)  = 
(G' ,  a' ,  A' ,  s' ,  false).  The  least  fixed  point  G  of  I  satisfies  the  equation  I(G,0,p)  =  (G,  a,  A,  error).  By 
definition,  G  A  G' .  From  point  2,  we  conclude  that  error  A  false,  i.e.  error  =  false.  So  a  proof  can  be 
built  from  G. 
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C  Examples 


This  appendix  shows  the  results  produced  by  our  inference  system  on  the  more  complex  examples  from  Figure  1. 
The  while  loops  of  Figures  lb  and  le  are  rewritten  using  letrec  so  that  we  can  directly  apply  the  rules  in  Figure  3. 
Figure  6  shows  the  new  code. 


letrec  wl()  =  if  random!) 

o 

1 

V 

•H 

(barrier;  wl()) 

letrec  w2()  =  if  (i  <  10) 

else 

(if  (i  =  1)  barrier; 

0 

T — 1 

+ 

•H 

1 

V 

•H 

in  wl ( ) ; 

w2() ) 

workl();  barrier!); 

else 

work2();  barrier!); 

0 

work3() ; 

in  w2  ( ) ; 

barrier; 

Example  (b) 

Example  (f) 

Figure  6:  Loops  rewritten  with  letrec 


•  Figure  lb  fails  [If-Multi].  We  end  up  trying  to  match 

{wl  :  (),  0  — >  +,  0,  -L},  0  b  random!)  :  0,  e 

{wl  :  (),  0  — >  +,  0,  _L},  0  b  (barrier;  wl  () )  :  +,  0,  b 

{wl  :  0,0  — t  +,  0,  -L},  01  0  :  +,  0,  e  [If-Multi] 

iUt  =  //  /  The  rule  fails 

b  if  (random!))  (barrier;  wl())  else  0:? 

•  Figure  If  succeeds  with  this  signature  for  w2:  0,  (i  :+)—»+,  (i  :+),/. 

•  Figure  lg  successfully  passes  [If-Multi] 

b  random! )  :  ^  0,  e 
b  (barrier;  barrier)  :  +,  0,  bb 

b  (workl();  barrier;  work2();  barrier)  : +,  0,  bb  [If-Multi] 

bb  U  bb  =  bb  -<  f 

b  if  (random! ) )  (  •  •  • )  else  (...)  :  0, bb 

•  Figure  lh  fails  because  both  branches  have  abstract  synchronization  sequence  / 

b  random! )  :  ^  0,  e 
b  (while  ...):+,  0,  / 

b  (j  =  i  +  10;  ...):+,  0,  /  [If-Multi] 

/  U  /  =  /  /  /  The  rule  fails 
b  if  (random! ) )  (...)  else  (...)  :? 
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D  Runtime  Error  Checking 


Figure  7  adds  new  semantics  rules  to  C  that  detect  the  following  runtime  errors:  mismatch  of  barrier  and 
broadcast,  and  termination  of  some  processes  while  others  are  waiting  at  a  barrier  or  broadcast. 

C  Cont  =  Env  x  TV"  — t  State 

State  =  FunEnv  x  Env  x  Cont  x  Expression  +  Env  x  7V"+  _L 

[...,(Fi,Ei,  C\,  broadcast),  (Fj,  Ej,  Cj,  barrier),...]  [_L,...,_L] 

[.  .  . ,  (Ei,  Vi),  .  .  . ,  (Fj,  Ej,  C'j,  barrier /broadcast) ,..  .]  [_L,...,_L] 


Figure  7:  Semantic  rules  for  runtime  synchronization  error  detection 

Theorem  3.3  is  now  stronger,  as  it  implies  that  an  evaluation  does  not  terminate  as  [_L,  .  .  . ,  _L].  As  it  is  impossible 
to  apply  the  new  semantic  rules  of  Figure  7  to  a  single  process,  Lemma  3.2  is  valid  in  the  new  system,  and  therefore 
so  is  Theorem  3.3. 

As  a  consequence,  barrier  inference  guarantees  that  a  program  cannot  have  a  mismatch  of  a  barrier  or  broadcast 
and  also  that  processes  cannot  wait  at  a  barrier  or  broadcast  when  some  processes  of  the  SPMD  program  have 
terminated.  This  eliminates  the  need  for  runtime  error  checking  of  these  conditions. 
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E  Unstructured  Single  Inference 


This  appendix  gives  additional  details  and  examples  on  the  inference  of  single-valued  variables  in  unstructured 
control-flow  graphs.  Global  variables  are  considered  as  implicit  arguments  and  results  of  functions  and  are  otherwise 
treated  exactly  as  local  variables. 

Branch  Outcome  Dependences 

A  statement  s  is  directly  branch  dependent  on  outcome  o  of  branch  b  if  s  £  CD(b)  and  s  postdominat.es  o,  where 
CD(6)  is  the  set  of  statements  control-dependent,  on  b,  and  an  outcome  of  a.  branch  is  one  of  its  successors. 

The  branch-dependences  relation  is  the  closure  of  the  direct,  branch  dependence  relation. 

Figure  8  shows  the  branch  dependences  for  three  statements  in  a.  simple  control-flow  graph.  Statement.  s2  is 
interesting  because  it.  depends  on  both  outcomes  of  condition  a.  This  captures  the  intuition  that,  the  outcome  of 
decision  a  is  important,  to  whether  statement.  s2  gets  evaluated,  in  that.  it.  determines  wha.t.  other  condition  (b  or 
c)  gets  tested  to  directly  determine  whether  s2  gets  executed  or  not..  All  of  a,  b,  c  must,  be  single- valued  for  all 
processes  in  a.  Split.-C  program  to  get.  the  same  value  of  x. 


a 


sl:x=l  s2:x  =  2  s3:x  =  3 

outcome 

,  ,  aO,  bO  aO,  al,  bl,  cO  al,  cl 

dependencies 

Figure  8:  Branch  outcome  dependences 


Single-valuedness  at  (^-functions 

The  value  of  a.  ^-function  Vo)  depends  on  branch  b  iff  different,  outcomes  for  branch  b  are  found  in 

branch-dependences (def init ion(  ci ) ) 

and 

branch-dependences (def init ion(  vo ) ) 

where  def inition(o)  is  the  statement,  where  v  is  assigned.  The  set.  of  branches  on  which  ^-function  s  depends  is 
called  (/>-dependences(s). 

Any  (^-function  with  more  than  2  arguments  is  handled  by  considering  all  pairs  of  variables. 

Figure  9  adds  some  control-flow  merges  to  Figure  8.  The  branch  dependences  are: 


•  x4  depends  on  the  a  and  b  branches  as  xl  and  x2  have  different,  branch  outcomes  in  their  branch  dependence 
sets.  Notice  that  x4  is  not.  dependent,  on  outcomes  of  branch  b. 

•  x5  depends  on  the  a  and  c  branches.  It.  doesn’t,  depend  on  b  directly,  but.  it.  depends  on  x4  that  depends  on 
b. 
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a 


outcome 

dependencies 


si :  xl  =  1  s2:  x2  =  2  s3:  x3  =  3 

aO,  bO  aO,  al,  bl,  cO  al,  cl 


x5  =  phi(x4,  x3) 


Figure  9:  Branch  dependences  at  (^-functions 


Dependence  Sets 

The  dependence  set  for  v  is  the  set  of  variables  that  must  be  invariant  for  v  to  be  invariant.  There  are  three  cases: 

1.  v  is  the  result  of  an  assignment  v  =  op(vir%2,  •  •  •)•  var-dependences(o)  =  { t > i ,  t>2,  ■  . 

2.  v  is  the  result  of  an  assignment  v  =  v2,  .  .  .). 

var-dependences(o)  =  { i?  i ,  v2,  •  •  •}  U  (  |^J  branch-variables(6)) 

6e^>-dependences(tJ=^(tJi,tJ2,...)) 

where  branch-variables(s)  is  the  set  of  variables  that  determine  the  outcome  of  branch  statement  s. 

3.  v  is  assigned  in  some  other  fashion  (e.g.  a  function  call),  v  has  no  var-dependences  set. 

Building  the  Constraints 

The  maximal  solution  of  this  set  of  constraints  gives  the  set  of  single- valued  variables.  Variables  are  either  known 
to  be  single- valued,  known  not  to  be  single- valued,  or  depend  on  other  variables. 

For  every  variable  v  that  has  a  dependence  set,  add  the  constraint 

(  /\  w)  ^  v 

«>evar-dependences(u) 

A  false  O  v  is  added  for  every  input  argument  that  is  not  single-valued,  and  for  every  function  call  result  that  is 
not  single- valued  (global  variables  are  considered  implicit  arguments  to  and  results  of  functions). 
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The  language  semantics  may  mandate  the  addition  of  other  false  O  v  constraints,  e.g.  pointer  dereferences. 

Solving  the  Constraints 

The  following  algorithm  finds  a  maximal  solution  of  the  set  of  constraints  S  over  variables  V : 

truevars  =  V 

while  S  contains  a  constraint  ’false  <=>  v’ 
truevars  =  truevars  -  {  v  } 

S  =  S  -  {  ’false  <=>  v’  } 

replace  all  constraints  ’wl  &  ...  &  wn  <=>  w’  in  S  whose  left  hand  side 
contains  v  with  ’false  <=>  w’ 

end 

truevars  is  the  maximal  solution. 
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